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THE PRESENT STRUCTURE OF THE ASSOCIATION* 


Wiiuiram G. CocHRran 
The Johns Hopkins University 


IVE years ago, the Association adopted a new Constitution which 

was intended to facilitate substantial changes in the nature of the 
Association. Written Constitutions are not noted for their ability to 
grip and hold the reader’s interest, and I doubt whether many members 
paid more attention to the new Constitution than was necessary in 
deciding how to vote on it in 1948. Consequently, I would like to pre- 
sent some impressions of the experience of the Association during the 
first five years of operation under the new Constitution. I hope that this 
account will give members a better picture of the present nature of the 
Association and will lead up to several questions concerning our future 
development about which I wish to encourage members to do some 
thinking. 


THE SITUATION AS IT APPEARED IN 1945 


Planning for a new Constitution began when the Association was 
able to resume normal activities towards the end of World War II. 
In the early discussions about a suitable future pattern for the Associ- 
ation, the committee at work on the new Constitution took note of 
four developments in the field of statistics that seemed relevant. 

1. Statistical techniques had penetrated into a great variety of fields. 
Up till about 30 years ago, practical statistics dealt mainly with 
applications to economics, business and government, and the in- 
terests of the Association’s members tended to reflect this fact. It 
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is easy to exaggerate the extent to which this was so: the Associa- 
tion has always welcomed statisticians in any field of knowledge 
and 30 or 40 years ago the Journal was publishing important 
papers on a wide range of topics. But the organized activities of 
the Association dealt largely with applications in the economic 
sphere. In the 30’s, however, and still more in the early 40’s, the 
increased use of statistical ideas and techniques in such fields as 
psychology, the various branches of biology, medicine, the social 
sciences, industrial research and operations, and marketing was a 
striking phenomenon. 

. During the same period, persons interested in these other develop- 
ments had founded a number of new societies, among them the 
Institute of Mathematical Statistics, the Econometric Society, the 
Psychometric Society and the American Society for Quality Con- 
trol. All these societies were strongly concerned with statistical 
techniques, but none of them had any formal relation to the ASA. 

. The membership of the ASA was increasing and might be expected 
to grow rapidly in the post-war years. In 1945 there were about 
3,300 members, at present there are close to 5,000. 

. With the formation of the United Nations, some of its agencies 
might be expected to foster new developments in international 
statistics. 

In considering the future of the Association in the light of these 
factors, two principal choices appeared to be open. The Association 
might continue to give primary attention to applications in economics, 
leaving applications in other fields to be taken care of by other societies. 
This would have been a reasonable course of action. Although the As- 
sociation had received an influx of members whose interests were in 
other fields, the primary concern of over half the members in 1945 was 
still with applications to economics or business, as revealed by the 
1945 Directory. 

The second course, the one actually adopted, was to try to give the 
Association a central role with regard to all fields of application of sta- 
tistics. This decision was advocated by almost all members whose 
opinions were sought. It was a wise decision from many points of view, 
particularly when no one knew where important statistical applications 
might turn up next, when statistical activities were being parcelled 
out amongst numerous societies and when a strong national body 
might accomplish much in cooperation with international agencies. 
We should recognize, however, that the decision involved a real sacri- 
fice, at least for a time, by the members in economics and business, 
since a relatively homogeneous society catering satisfactorily to them 
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was to be changed into something more amorphous whose future 
course was harder to predict. These members accepted and encouraged 
the change with excellent spirit and with, as might be expected, occa- 
sional grumbles. 


SOME PROVISIONS OF THE 1948 CONSTITUTION 


The decision having been taken, the new Constitution was con- 
structed so as to introduce a number of devices that would make the 
desired changes easier to accomplish. I would like to describe the pur- 
poses, as I understand them, of some of the principal provisions in 
the 1948 Constitution. 

Associated and affiliated societies. One of the most difficult questions 
was: what was to be the relation between the ASA and the other so- 
cieties dealing with some aspect of statistics that had come into being or 
might be established in the future? Much thought was given to this 
question, including a study of various mechanisms that had been 
adopted by other large central organizations. Finally, it was decided 
to try two provisions, called association and affiliation. 

Any other society interested in the objects of the ASA may apply to 
become an Associated or an Affiliated Society. The status of an Associ- 
ated Society is intended for societies whose interest in statistics is 
strong: that of an Affiliated Society was intended to cover a looser 
type of connection, but since this provision was dropped in our recent 
minor revision of the Constitution, I will not go into detail about it. 
Proposals for association are examined by our Board of Directors and 
Council before a decision is taken to grant the status. 

Each Associated Society receives the right to appoint two members 
to the Council of the ASA, one member to the editorial board of The 
American Statistician, and one member to the ASA Committee on 
Publications for each periodical which it publishes. The ASA is required 
to offer its publications to the members of Associated Societies on the 
same basis as to ASA members, and vice versa. 

The arrangement involves a slight loss of autonomy by the ASA. 
In return, it establishes a definite method of diaison, makes our Coun- 
cil more representative of statistical interests as a whole, and puts us 
in a better position to play the kind of central role that was considered 
desirable. 

Sections and section committees. If the ASA is to be a society whose 
members have a great variety of interests, what can be done to ensure 
that each of the principal interest-groups within the membership 
participates to its own satisfaction? 

For dealing with this problem, the ASA had a successful precedent 
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in the Biometrics Section, which had been in existence for a number of 
years. Although only a small fraction of the membership was interested 
in biometry as such, this Section arranged programs at each annual 
meeting, held joint sessions at the meetings of a number of the biologi- 
cal societies and published the Biometrics Bulletin with financial back- 
ing from the ASA. 

The 1948 Constitution encouraged the formation of Sections in other 
broad areas by providing for the establishment of Section Committees. 
The general function of Section Committees is “to further the develop- 
ment of statistics in fields not adequately covered at present by associ- 
ated or affiliated societies.” (Article X, 8). These Committees are 
represented on the ASA program committee in order to arrange pro- 
grams in their individual areas. In course of time, a Section Committee 
may draw up a charter which on approval leads to the formation of a 
Section. The new Constitution looks still further ahead by providing 
that when a Section has grown large enough, the Section Committee 
may take the initiative in organizing an Associated Society. 

Districts and District Committees. In nation-wide societies that are 
small, meetings tend to be on a national level. As the society grows in 
numbers, it becomes feasible to hold regional meetings which give more 
of the members a chance to participate. In the ASA we have been 
fortunate in having a long tradition of meetings both at the national 
level and through our Chapters at the local level. In order to encourage 
activities and meetings at an intermediate regional level, the Consti- 
tution provides for the setting-up of geographical districts. In each, 
there is a District Committee, with two members from each ASA 
chapter and from each local unit, if there are any, of any Associated or 
Affiliated Society. The District Committees thus provide a means for 
coordinating the activities of the ASA and related societies at both the 
the local and regional levels. 

Council. Finally, in order to give the membership a broader repre- 
sentation in the administration of the ASA, the Constitution created 
a new policy-making body, the Council. This consists of the Board of 
Directors, the editor of each ASA publication, two representatives 
from each district and one from each Section Committee with more than 
75 members, as well as representatives of Associated Societies and an 
equal number of representatives-at-large. The Board of Directors, 
which in former times was the governing body, now serves as the 
executive committee of the Council. During 1953, the Council had 34 
members, as compared with 13 on the Board. 
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THE ASSOCIATION’S EXPERIENCE UNDER THE 1948 CONSTITUTION 


I would now like to describe how the new devices have operated 
during the past 5 years. In cases where things have not as yet worked 
quite as actively as was hoped, I do not want to give the impression 
of washing dirty linen in public, which would be most reprehensible for 
a President. My defense would be that this linen is not dirty, and it is 
not being washed, but merely aired. 

Associated Societies. Up to the present time, only one organization 
has become linked to us through this provision—the East North Amer- 
ican Region of the Biometric Society, which might be regarded as one 
of our own children grown up, since the Biometric Society is a natural 
outgrowth of our Biometrics Section. 

This modest beginning is not surprising, because no strenuous efforts 
have been made to bring the provision to the attention of other socie- 
ties. In my opinion, it is advisable to wait until the ASA has settled 
down under the new Constitution before exploring with some of the 
other societies the possibility of a closer relationship, although we have 
progressed far enough so that any good opportunity for initiating dis- 
cussions should not be missed. Perhaps the most propitious times will 
be when cooperation has already arisen about some matter of mutual 
interest, or when a new society has been launched with the guidance 
of the ASA. With the older societies, we may also have to recognize 
and handle tactfully a problem of prestige. Some members of these 
societies may feel that Association implies in some way a recognition 
of a lower status. No such status was intended in framing these pro- 
visions, under which the ASA sacrifices some autonomy, but the other 
society does not, as is clearly stated in our Constitution. 

Sections and Section Committees. Excellent progress has been made in 
establishing a well-rounded group of Sections. This year, the Section 
on Social Statistics has been added to those on Biometrics, Business 
and Economic Statistics and Training in Statistics. A Committee on 
Statistics in the Physical Sciences has been at work for 2 years. Jointly 
these 5 areas appear comprehensive enough to cover the major inter- 
ests of practically all our members, at least for the time being. Perhaps 
the largest single group unrepresented by a Section are the members 
whose primary interest is in statistical theory. So long as the Institute 
of Mathematical Statistics continues to meet with us, as it has done 
consistently in the past, such members are unlikely to regard them- 
selves as neglected. In arranging the large number of sessions (cur- 
rently around 50) which now comprise the program at the annual 
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meeting, the Section representatives have worked most efficiently and 
amicably, and I believe that we have a smooth mechanism for ac- 
complishing this complicated task. The Section Committees have also 
been active in varying degrees in other projects, and have been called 
upon on numerous occasions for advice by the Board and Council. 

Districts and District Committees. Activity in arranging meetings of 
something approaching a regional character, which was one of the 
primary intentions in setting up districts, has proceeded satisfactorily. 
The initiative, however, has come from different directions on different 
occasions. The interesting programs at the United Nations head- 
quarters in New York in 1952 and 1953 were a joint venture by several 
Chapters. The successful series of Institutes at the Universities of 
of Illinois and Pennsylvania and at the Carnegie Institute of Tech- 
nology involved cooperative planning among a number of groups, 
prominent among them being the Business and Economic Statistics 
Section. The regional meeting to be held in San Francisco in December, 
1954, will be the responsibility of the Western District. Thus, what was 
perhaps the principal object in setting up District Committees is being 
achieved, although the Committees themselves have not been uniformly 
active. : 

The Council. In creating the Council, the intent was to give the mem- 
bership a larger role in the policy-making of the ASA and perhaps also 
to allow for more deliberation on policy problems. I think it is fair to 
say that these aims have not been fulfilled thus far. The annual meet- 
ing of the Council takes place at the beginning of the new President’s 
term of office, a day or two after the new Council members have been 
elected. The agenda is a full one, with enough questions calling for im- 
mediate decision to leave little time or energy for leisurely discussion 
of long-range policy problems. The Beard members tend to be the more 
active participants in the discussion, because they are more familiar 
with the issues than those who are not Board members. 

It can be argued, of course, that if affairs are running smoothly 
without intense Council activity, as they appear to be, there is no point 
in looking for more work for the Council just to keep them busy. Also, 
a group with around 30 members is of an awkward size for some types 
of work and deliberation. The Council can meet at other times and can 
be polled by mail, so that it stands ready when any important policy 
matter arises. On the other hand, since the council is our policy-making 
body, our most representative body, and the body on which nominees 
from other societies will see us in action, there is a strong case for trying 
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to make it more continuously effective. There are several techniques 
that would be worth experimentation, and the Board has been consid- 
ering a plan of action. I am sorry that during my term of office I did not 
make a beginning. 


THE PRESENT STRUCTURE OF THE ASSOCIATION 


As indicated previously, the wording of the 1948 Constitution sug- 
gests that the ASA would assume a more definitely central role in sta- 
tistics by establishing, through association, links with other societies 
which recognized this role for the ASA. Section Committees were 
apparently regarded as more of an interim mechanism, since the Con- 
stitution describes them as applicable to “fields not adequately covered 
at present by associated or affiliated societies” and regards them asa 
means for organizing an associated society. 

As events have turned out, the formation of Sections and Section 
Committees has been the predominant feature in the development of 
the ASA during the past five years, while only a bare beginning has 
been made in linking ourselves with other societies. This has been a 
sound order of procedure, in that we have been working hard to try 
to serve the whole range of statistics, before putting forward claims 
that we are able to do so. It now looks as if many of our most impor- 
tant activities during the next few years will be in the hands of the 
Sections. I hope that members of Section committees will realize how 
important these committees have become. Their useful activity is by 
no means confined to helping with the program at the Annual Meetings, 
but may include the planning of more specialized meetings, contribu- 
tions to the publication program of the ASA and factual studies of prob- 
lems that confront the content fields. 

As the Sections become larger and better established, what will be 
the next step in the evolution of the ASA? In particular, what will 
happen if a Section develops into Associated Society or if a society 
already in existence in the field of the Section becomes associated with 
us? I do not know the answer, but some recent experiences of the 
Biometrics Section are worth noting. | 

After the North American regions of the Biometric Society had 
been established, the members of the Biometrics Section began a 
lively discussion of the future of this Section. Some members contended 
that the Biometrics Section should be dissolved. They claimed that the 
new regions of the Biometric Society could take care of the welfare of 
biometry in this country, that their administration would to a large 
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extent be in the hands of ASA members anyway, and that continua- 
tion of the Biometrics Section would be an unnecessary duplication of 
effort. 

An opposing view was that for a statistician, membership in the 
Biometric Society serves a different purpose from membership in the 
Biometrics Section. At present, about half the members of the Biometric 
Society are biologists. If this Society is to flourish in its original objec- 
tives, it must continue to attract to membership a large number, 
preferably a majority, of biologists who would not join any statistical 
association. Thus the Biometric Society gives the statistician the oppor- 
tunity to talk with biologists, learning their problems, working with 
them, and presenting new techniques for criticism and use. The ASA, 
on the other hand, is the place where statisticians in biometry can talk 
with statisticians in other content fields, both to find out what new 
techniques have developed in these fields and to present new ideas in 
biometry. From this point of view there was a strong argument for 
continuing the Biometrics Section as a nucleus for attracting future 
biometricians into the ASA, for cooperating with other Sections and 
for organizing programs on new or recent discoveries, where the techni- 
cal level would be too high for most biologists. 

After much debate, the decision was taken to continue the Bio- 
metrics Section. I do not claim that it was the argument given above 
which carried the day. Biometricians, like other statisticians, are fond 
of nice logical distinctions, and each tends to put forward a slightly dif- 
ferent reason for advocating the same decision, and to attach great 
importance to the superiority of his reason over anyone else’s, even 
though to an outsider the reasons are practically indistinguishable. 
But I hope that the argument will not be overlooked if other Sections 
blossom into full societies and their members are uncertain whether 
to continue the Section. If this concept of the purpose of a Section is 
sound, the greatest benefit will be obtained from the present ASA struc- 
ture only if there is sustained cooperation among Sections and if mem- 
bers make a habit of attending sessions of several different Sections. 
There is, of course, nothing to prevent a member from belonging to 
every Section. 

If the opposing view prevails, and if we are to look forward to seeing 
the Sections disband one by one as Associated Societies are formed 
(as might happen if there is a general lack of interest in continuing the 
Sections) then the structure of the ASA will evolve towards something 
different. A conservative might comment that it would then resemble 
either a jellyfish or an octopus, depending on how one looks at it. More 
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seriously, I do not mean to suggest that Sections should be kept alive 
if there is no intrinsic life in them. We should, however, have to re- 
examine the whole problem of the best type of structure for the ASA 
under the changed conditions. Actually, some types of organization 
that did not involve Sections at all were examined in the initial work 
for the 1948 Constitution, but were rejected as being unsuitable in our 
present state of growth. 


SOME QUESTIONS CONCERNING THE VITALITY OF THE ASSOCIATION 


To consider our present structure from a slightly different point of 
view, I would now like to pose a few broad questions which bear upon 
what might be called the state of health of the Association. 

Can the ASA maintain the enthusiastic support of its members? Any 
large and heterogeneous society is likely to find that it is nobody’s 
darling, because the affections of the members are accorded to some 
smaller and more homogeneous group in which they feel more at home. 
As the Association grows larger in its new role, it may be more difficult 
to give the members a real sense of participation. The Journal and The 
American Statistician, as the most tangible benefits from membership, 
have an important part to play, and it is currently planned to supple- 
ment these periodicals from time to time with special monographs and 
other publications of interest to the members. Meetings of a local or 
regional character are a beneficial addition to our Annual Meetings 
as a means of bringing together more of our members. Our Chapters 
and Sections may accomplish much in giving members a more immedi- 
ate focus for their interests. Continued joint activity by different Sec- 
tions will avoid a partitioning into self-contained groups that has 
occurred in some societies. In addition, I hope that members will con- 
tinue to agree that statistics needs an all-embracing society, and will 
appreciate that the Association will inevitably become more diffuse as 
it succeeds in adopting this role. 

Can the ASA continue to recruit young members? It is relatively pain- 
less for them to enter into membership: students pay only half the 
regular dues, as do also members under 30 during their first year. The 
office conducts a continuing campaign to spread information about 
membership, the groups approached being varied from year to year. 
As in other societies, our office finds that nothing succeeds so well as 
a personal approach from a present member, so that it is to our mem- 
bers and to the quality of our publications that we must look mainly 
for a steady recruitment of young persons. 

Does the structure of the ASA encourage younger members, as they ma- 
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ture, to partic*pate in the running of the ASA? Since the rapid growth of 
statistics is recent, we suffer relatively little from government by the 
grey-haired. Nevertheless, many of our most experienced members are 
heavily burdened with activities on behalf of scientific societies. For 
this reason, as well as to keep us supplied with fresh points of view, the 
talents of younger members should be utilized to the fullest extent. 
The Chapters and the Section and District Committees provide the 
first opportunity for younger members to undertake responsible tasks. 
For service at the national level on the Council or Board, the problem 
of introducing new blood is more difficult. In the elections, which are 
by majority vote, my impression is that the candidate wno is more 
widely known (and usually older) is very frequently the winner. Some- 
thing can be done about this problem both by the Committee on Elec- 
tions when they nominate candidates and by the President when he 
appoints committees. 

Is the ASA able to stimulate new developments in statistics? Some mem- 
bers have expressed the opinion that in the thirties and early forties 
the ASA missed an opportunity by not playing a more prominent part 
in the developments which led to the formation of a number of other 
societies with statistical interests. I am not sure that I would agree. In 
the Biometric Society, which we did help to establish, I have been 
slightly disturbed in case the statisticians should play too prominent 
a role relative to the biologists. In founding this kind of a society, there 
is something to be said for leaving much of the initiative to the scien- 
tists in the subject-matter field, who would not in general be members 
of the ASA. Nevertheless, our assumption of the role of a central 
organization with very wide interests does carry more responsibility 
for helping such developments, rather than leaving them to take place 
outside the ASA. 

Here again we must rely mainly on the Section Committees, particu- 
larly when they arrange programs, to be on the lookout for new de- 
velopments. Inspection of the wide range of our programs in recent 
years suggests that the committees have been lively and enterprising in 
this respect. The Board and Council and the office can also help. For 
a time, the Board felt impelled to adopt a cautious policy owing to our 
financial difficulties, but fortunately these appear to be well out of the 
way. 

Is the ASA able to exercise leadership for statistics as a whole? So far as 
the use of statistics in government is concerned, our leadership is recog- 
nized as a result of a long history of disinterested service to agencies of 
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the government. I believe that international statistical agencies would 
also join in this recognition. 

How do we stand in other areas involving statistical interests? Are 
we active enough in exercising leadership? These questions are more 
troublesome. Two areas that have always been of deep concern to the 
Council and Board are that involving relations with the public and that 
involving Statistical Standards. A piece of sound and important statis- 
tical work may be subjected to unjustified public attack, or a piece of 
shoddy and unscrupulous work, masking as statistically sound, may 
threaten to bring discredit on the profession. Should such circum- 
stances arise, I imagine that most members would expect the Associ- 
ation to take corrective action. The problem of doing this effectively 
raises numerous difficulties. The critical moment for taking action may 
not be clear: there may be varying opinions about the most appropri- 
ate type of action; and the pressure of time may prevent thorough study 
of the issue before something must be done. For these reasons I am 
doubtful whether reliance on any standby body, such as the Council 
or some designated committee, will be adequate. The analogy with a 
fire brigade is not good, because nobody rings the alarm bell to tell us 
when to spring into action. The Council and Board have been strug- 
gling to consider what program of study might be initiated in order to 
establish a set of principles and a mode of action for dealing with such 
emergencies so that we will not be caught unawares. This is a task 
that needs all the help that members can give. For many of the prob- 
lems it seems clear that to be fully effective, the ASA must work along 
with other societies that have statistical interests. Consequently, a 
program of this kind may be one means of drawing us closer to these 
societies. 

Finally, any account of our present structure must recognize that 
we are a voluntary organization. Apart from a tiny office staff, every- 
thing that we do depends on the voluntary labor of the members. The 
Association can become what the members want it to be: there is no 
entrenched bureaucracy to impose its own pattern. Any member with 
a bright idea will receive an interested hearing (although he may some- 
times have to talk a litéle loudly in order to do so). If his idea is bright 
enough, he will very likely find himself asked to carry it out as an enter- 
prise of the Association. Secondly, we are a scientific as distinct from 
a professional society, in the sense that the Association has always 
worked for the highest statistical standards rather than for the eco- 
nomic interests of its members. As we grow larger, it may be harder to 
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retain this voluntary, scientific character while representing effectively 
the Whole range of statistical activities. For my part, I hope that we 
can do both. 

To summarize, the ASA is in a difficult period of growth in trying to 
keep up with an extraordinary expansion of statistics which scarcely 
anyone could have predicted accurately. In particular, the increasing 
specialization within statistics has set up forces which tend to decrease 
the amount of common interest amongst members and to split them 
into separate groups. The task of serving all areas of application in 
this rapidly-changing environment will require us to be wide-awake, 
adaptable, and receptive to new ideas and new ventures. My own 
appraisal would be that during the past five years our Association has 
made gratifying progress, especially in view of the financial stringencies 
which inflation imposed upon us. Some of the provisions of the 1948 
Constitution have had only modest effects as yet, but these provisions 
have not proved harmful: they create mechanisms that will increase 
our flexibility in adapting ourselves to the future growth of the field 
of statistics. Although much remains to be done, I believe that we 
now have an organizational pattern that at least for the near future 
will enabie us to take full advantage of our broad, common interests 
while giving scope also to our more specialized interests. 
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PRINCIPLES OF SAMPLING* 


Wi.uiraM G. Cocuran, Johns Hopkins University 
FREDERICK MostTeE.uER, Harvard University, 
Joun W. Tukey, Princeton University 


I. SAMPLES AND THEIR ANALYSES 
1. Introduction 


HETHER by biologists, sociologists, engineers, or chemists, sam- 

pling is all too often taken far too lightly. In the early years of 
the present century it was not uncommon to measure the claws and 
carapaces of 1000 crabs, or to count the number of veins in each of 1000 
leaves, and then to attach to the results the “probable error” which 
would have been appropriate had the 1000 crabs or the 1000 leaves 
been drawn at random from the population of interest. Such actions 
were unwarranted shotgun marriages between the quantitatively un- 
sophisticated idea of sample as “what you get by grabbing a handful” 
and the mathematical precise notion of a “simple random sample.” 
In the years between we have learned caution by bitter experience. 
We insist on some semblance of mechanical (dice, coins, random num- 
ber tables, etc.) randomization before we treat a sample from an exist- 
ent population as if it were random. We realize that if someone just 
“grabs a handful,” the individuals in the handful almost always re- 
semble one another (on the average) more than do the members of a 
simple random sample. Even if the “grabs” are randomly spread around 
so that every individual has an equal chance of entering the sample, 
there are difficulties. Since the individuals of grab samples resemble 
one another more than do individuals of random samples, it follows 
(by a simple mathematical argument) that the means of grab samples 
resemble one another less than the means of random samples of the 
same size. From a grab sample, therefore, we tend to underestimate 
the variability in the population, although we should have to over- 
estimate it in order to obtain valid estimates of variability of grab 
sample means by substituting such an estimate into the formula for 
the variability of means of simple random samples. Thus using simple 
random sample formulas for grab sample means introduces a double 





* This paper will constitute Appendix G of Cochran, Mosteller, and Tukey, Statistical Problems 
of the Kinsey Report, to be published by the American Statistical Association later this year as a 
monograph. The main body of this monograph was published in the Journal last December (Vol. 48 
(1953), pp. 673-716). 
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bias, both parts of which lead to an unwarranted appearance of higher 
stability. 

Returning to the crabs, we may suppose that the crabs in which we 
are interested are all the individuals of a wide-ranging species, spread 
along a few hundred miles of coast. It is obviously impractical to seek 
to take a simple random sample from the species—no one knows how 
to give each crab in the species an equal chance of being drawn into 
the sample (to say nothing of trying to make these chances inde- 
pendent). But this does not bar us from honestly assessing the likely 
range of fluctuation of the result. Much effort has been applied in 
recent years, particularly in sampling human populations, to the de- 
velopment of sampling plans which simultaneously, 

(i) are economically feasible 

(ii) give reasonably precise results, and 

(iii) show within themselves an honest measure of fluctuation of 
their results. 

Any excuse for the dangerous practice of treating non-random samples 
as random ones is now entirely tenuous. Wider knowledge of the princi- 
ples involved is needed if scientific investigations involving samples 
(and what such investigation does not?) are to be solidly based. Addi- 
tional knowledge of techniques is not so vitally important, though it 
can lead to substantial economic gains. 

A botanist who gathered 10 oak leaves from each of 100 oak trees 
might feel that he had a fine sample of 1000, and that, if 500 were in- 
fected with a certain species of parasites, he had shown that the per- 
centage infection was close to 50%. If he had studied the binomial 
distribution he might calculate a standard error according to the usual 
formula for random samples, p++/pq/n, which in this case yields 
50+ 1.6% (since p=q=.5 and n= 1000). In this doing he would neglect 
three things: 

(i) Probable selectivity in selecting trees (favoring large trees, per- 
haps?), 

(ii) Probable selectivity in choosing leaves from a selected tree 
(favoring well-colored or, alternatively, visibly infected leaves 
perhaps), and 

(iii) the necessary allowance, in the formula used to compute the 
standard error, for the fact that he has not selected his leaves 
individually at random, as the mathematical model for a simple 
random sample prescribes. 

Most scientists are keenly aware of the analogs of (i) and (ii) in their 
own fields of work, at least as soon as they are pointed out to them. 
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Far fewer seem to realize that, even if the trees were selected at random 
from the forest and the leaves were chosen at random from each 
selected tree, (iii) must still be considered. But if, as might indeed 
be the case, each tree were either wholly infected or wholly free of 
infection, then the 1000 leaves tell us no more than 100 leaves, one from 
each tree. (Each group of 10 leaves will be all infected or all free of in- 
fection.) In this case we should take n= 100 and find an infection rate 
of 50+5%. 

Such an extreme case of increased fluctuation due to sampling in 
clusters would be detected by almost all scientists, and is not a serious 
danger. But less extreme cases easily escape detection and may there- 
fore be very dangerous. This is one example of the reasons why the 
principles of sampling need wider understanding. 

We have just described an example of cluster sampling, where the 
individuals or sampling units are not drawn into the sample inde- 
pendently, but are drawn in clusters, and have tried to make it clear 
that “individually at random” formulas do not apply. It was not our 
intention to oppose, by this example, the use of cluster sampling, which 
is often desirable, but only to speak for proper analysis of its results. 


2. Self-wetghting probability samples 


There are many ways to draw samples such that each individual 
or sampling unit in the population has an equal chance of appearing 
in the sample. Given such a sample, and desiring to estimate the popu- 
lation average of some characteristic, the appropriate procedure is to 
calculate the (unweighted) mean of all the individual values of that 
characteristic in the sample. Because weights are equal and require no 
obvious action, such a sample is self-weighting. Because the relative 
chances of different individuals entering the sample are known and 
compensated for (are, in this case, equal), it is a probability sample. 
(In fact, it would be enough if we knew somewhat less, as is explained 
in Section 5.) 

Such a sample need not be a simple random sample, such as one 
would obtain by numbering all the individuals in the population, and 
then using a table of random numbers to select the sample on the basis: 
one random number, one individual. We illustrate this by giving vari- 
ous examples, some practical and others impractical. 

Consider the sample of oak leaves; it might in principle be drawn 
in the following way. First we list all trees in the forest of interest, 
recording for each tree its location and the number of leaves it bears. 
Then we draw a sample of 100 trees, arranging that the probability 
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of a tyee’s being selected is proportional to the number of leaves which 
it bears. Then on each selected tree we choose 10 leaves at random. 
It is easy to verify that each leaf in the forest has an equal chance of 
being selected. (This is a kind of two-stage sampling with probability 
proportional to size at the first stage.) 

We must emphasize that such terms as “select at random,” “choose 
at random,” and the like, always mean that some mechanical device, 
such as coins, cards, dice, or tables of random numbers, is used. 

A more practical way to sample the oak leaves might be to list only 
the locations of the trees (in some parts of the country this could be 
done from a single aerial photograph), and then to draw 100 trees in 
such a way that each tree has an equal chance of being selected. The 
number of leaves on each tree is now counted and the sample of 1000 
is prorated over the 100 trees in proportion to their numbers of leaves. 
It is again easy to verify that each leaf has an equal chance of appear- 
ing in the sample. (This is a kind of two-stage sampling with prob- 
ability proportional to size at the second stage.) 

If the forest is large, and each tree has many leaves, either of these 
procedures would probably be impractical. A more practical method 
might involve a four-stage process in which: 

(a) the forest is divided into small tracts, 

(b) each tract is divided into trees, 

(c) each tree is divided into recognizable parts, perhaps limbs, and 

(d) each part is divided into leaves. 

In drawing a sample, we would begin by drawing a number of tracts, 
then a number of trees in each tract, then a part or number of parts 
from each tree, then a number of leaves from each part. This can be 
done in many ways so that each leaf has an equal chance of appearing 
in the sample. 

A different sort of self-weighting probability sample arises when we 
draw a sample of names from the Manhattan telephone directory, 
taking, say, every 17,387th name in alphabetic order starting with one 
of the first 17,387 names selected at random with equal probability. 
It is again easy to verify that every name in the book has an equal 
chance of appearing in the sample (this is a systematic sample with a 
random start, sometimes referred to as a systematic random sample). 

As a final example of this sort, we may consider a national sample 
of 480 people divided among the 48 states. We cannot divide the 480 
cases among the individual states in proportion to population very 
well, since Nevada would then receive about one-half of a case. If we 
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group the small states into blocks, however, we can arrange for each 
state or block of states to be large enough so that on a pro rata basis 
it will have at least 10 cases. Then we can draw samples within each 
state or block of states in various ways. It is easy to verify that the 
chances of any two persons entering such a sample (assuming adequate 
randomness within each state or block of states) are approximately 
the same, where the approximation arises solely because a whole num- 
ber of cases has to be assigned to each state or block of states. (This is 
a rudimentary sort of stratified sample.) 

All of these examples were (at least approximately) self-weighting 
probability samples, and all yield honest estimates of population char- 
acteristics. Hach one requires a different formula for assessing the sta- 
bility of its results! Even if the population characteristic studied is a 
fraction, almost never will 


Pq 
p+ 4/— 
n 


be a proper expression for “estimate + standard error.” In every case, 
& proper formula will require more information from the sample than 
merely the overall percentage. (Thus, for instance, in the first oak 
leaf example, the variability from tree to tree of the number infested 
out of 10 would be needed.) 


3. Representativeness 


Another principle which ought not to need recalling is this: By sam- 
pling we can learn only about collective properties of populations, not 
about properties of individuals. We can study the average height, the 
percentage who wear hats, or the variability in weight of college 
juniors, or of University of Indiana juniors, or of the juniors belonging 
to a certain fraternity or club at a certain institution. The population 
we study may be small or large, but there must be a population—and 
what we are studying must be a population characteristic. By sam- 
pling, we cannot study individuals as particular entities with unique 
idiosyncrasies; we can study regularities (including typical variabilities 
as well as typical levels) in a population as exemplified by the individ- 
uals in the sample. 

Let us return to the self-weighted national sample of 480. Notice 
that about half of the times that such a sample is drawn, there will be 
no one in it from Nevada, while almost never will there be anyone from 
Esmeralda County in that state. Local pride might argue that “this 
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proves\that the sample was unrepresentative,” but the correct position 
seems to be this: 

(i) the particular persons in the sample are there by accident, and 
this is appropriate, so far as population characteristics are con- 
cerned, 

(ii) the sampling plan is representative since each person in the U.S. 
had an equal chance of entering the sample, whether he came 
from Esmeralda County or Manhattan. 

That which can be and should be representative is the sampling plan, 
which includes the manner in which the sample was drawn (essentially 
a specification of what other samples might have been drawn and what 
the relative chances of selection were for any two possible samples) and 
how it is to be analyzed. 

However great their local pride, the citizens of Esmeralda County, 
Nevada, are entitled to representation in a national sampling plan 
only as individual members of the U.S. population. They are not en- 
titled to representation as a group, or as particular individuals—only 
as individual members of the U.S. population. The same is true of the 
citizens of Nevada, who are represented in only half of the actual 
samples. The citizens of Nevada, as a group, are no more and no less 
entitled to representation than any other group of equal size in the 
U.S. whether geographical, racial, marital, criminal, selected at 


random, or selected from those not in a particular national sample. 

It is clear that many such groups fail to be represented in any par- 
ticular sample, yet this is not a criticism of that sample. Representa- 
tion is not, and should not be, by groups. It is, and should be, by in- 
dividuals as members of the sampled population. Representation is not, 
and should not be, in any particular sample. It is, and should be, in the 
sampling plan. 


4. One method of assessing stability 


Because representativeness is inherent in the sampling plan and not 
in the particular sample at hand, we can never make adequate use of 
sample results without some measure of how well the results of this 
particular sample are likely to agree with the results of other samples 
which the same sampling plan might have provided. The ability to 
assess stability fairly is as important as the ability to represent the 
population fairly. Modern sampling plans concentrate on both. 

Such assessment must basically be in terms of sample results, since 
these are usually our most reliable source of information about the 
population. There is no reason, however, why assessment should de- 
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pend only on the sample size and the overall (weighted) sample mean 
for the characteristic considered. These two suffice when measuring 
percentages with a simple random sample, but in almost all other 
cases the situation is more complex. 

It would be too bad if, every time such samples were used, the user 
had to consult a complicated table of alternative formulas, one for 
each plan, before calculating his standard errors. (These formulas do 
need to be considered whenever we are trying to do a really good job 
of maximum stability for minimum cost—considered very carefully in 
selecting one complex design in preference to another.) Fortunately, 
however, this complication can often be circumvented. 

One of the simplest ways is to build up the sample from a number of 
independent subsamples, each of which is self-sufficient, though small, 
and to tabulate the results of interest separately for each subsample. 
Then variation among separate results gives a simple and honest yard- 
stick for the variability of the result or results obtained by throwing 
all the samples together. Such a sampling plan involves interpenetrating 
replicate subsamples. 

All of us can visualize interpenetrating replicate subsamples when 
the individuals or sampling units are drawn individually at random. 
Some examples in more complex cases may be helpful. In the first oak 
leaf example, we might select randomly, not one sample of 100 trees, 
but 10 subsamples of 10 trees each. If we then pick 10 leaves at random 
from each tree, placing them in 10 bags, one for each subsample, and 
tabulate the results separately, bag by bag, we will have 10 inter- 
penetrating replicate subsamples. Similarly, if we were to pick 10 sub- 
samples out of the Manhattan phone book, with each subsample con- 
sisting of every 173,870th name (in alphabetic order) and with the 10 
lead names of the 10 subsamples selected at random from the first 
173,870 names we would again have 10 interpenetrating replicate sub- 
samples. 

We can always analyze 10 results from 10 independent interpenetrat- 
ing replicate subsamples just as if they were 10 random selected indi- 
vidual measurements and proceed similarly with other numbers of 
replicate subsamples. 


5. General probability samples 


The types of sample described in the last section are not the only 
kinds from which we can confidently make inferences from the sample 
to the population of interest. Besides the trivial cases where the sample 
amounts to 90% or even 95% of the population, there is a broad class 
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of cases, including those of the last section as special cases. This is the 
aaa probability samples, where: 

(1) There is a population, the sampled population, from which the 
sample is drawn, and each element of which has some chance of 
entering the sample. 

(2) For each pair of individuals or sampling units which are in the 
actual sample, the relative chances of their entering the sample 
are known. (This implies that the sample was selected by a proc- 
ess involving one or more steps of mechanical randomization.) 

(3) In the analysis of the actual sample, these relative chances have 
been compensated for by using relative weights such that 

(relative chance) times (relative weight) equals a constant. 

(4) For any two possible samples, the sum of the reciprocals of the 

relative weights of all the individuals in the sample is the same. 
(Conditions (3) and (4) can be generalized still further.) In practice of 
course, we ask only that these four conditions shall hold with a suffi- 
ciently high degree of approximation. 

We have made the sampling plan representative, not by giving each 
individual an equal chance to enter the sample and then weighting 
them equally, but by a more noticeable process of compensation, where 
those individuals very likely to enter the sample are weighted less, 
while those unlikely to enter are weighted more when they do appear. 
The net result is to give each individual an equal chance of affecting 
the (weighted) sample mean. 

Such general probability samples are just as honest and legitimate as 
the self-weighting probability samples. They often offer substantial ad- 
vantages in terms of higher stability for lower cost. 

We can alter our previous examples, so as to make them examples of 
general, and not of self-weighting, probability samples. Take first the 
oak leaf example. We might proceed as follows: 

(1) locate all the trees in the forest of interest, 

(2) select a sample of trees at random, ; 

(3) for each sampled tree, choose 10 leaves at random and count (or 

estimate) the total number of leaves, 

(4) form the weighted mean by summing the products 

(fraction of 10 leaves infested) times 
(number of leaves on the tree) 
and then divide by the total number of leaves on the 100 trees 
in the sample. } 
When we selected trees at random, each tree had an equal probability 
of selection. When we chose 10 leaves from a tree at random, the 
chance of getting a particular leaf was 
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(number of leaves on the tree) 


Thus the chance of selecting any one leaf was a constant multiple of 
this and was proportional to the reciprocal of the number of leaves of 
the tree. Hence the correct relative weight is proportional to the number 
of leaves on the tree, and it is simplest to take it as 1/10 of that num- 
ber. After all, summing the products 
(fraction of 10 infected) times (leaves on tree) 

or 

(1/10) times (number out of 10 infected) times (leaves on tree) 
over all trees in the sample gives the same answer. One-tenth of this 
answer is given by summing 

(1/10) times (number out of 1 infected) times (leaves on tree) 
or 


(leaves on tree) 
10 





(number out of 1 infected) 


which shows that the weighted mean prescribed above is just what 
would have been obtained with relative weights of (number of leaves 
on tree) /10. 

If in sampling the names in the Manhattan telephone directory, we 
desired to sample initial letters from P through Z more heavily, we 
might proceed as follows: 

(1) Select one of the first 17,387 names at random with equal prob- 

ability as the lead name. 

(2) Take the lead name, and every 17,387th name in alphabetic 
order following it, into the sample. 

(3) Take every name which begins with P, Q, R, S,---, Zand is 
the 103rd or 207th name after a name selected in step 2 of the 
sample. 

Each name beginning with A, B, - - - , N, O has a chance of 1/17,387 
of entering the sample. Each name beginning with P, Q,---, Y, Z 
has a chance of 3/17,387 of entering the sample (it enters if any one of 
three names among the first 17,387 is selected as the lead name). Thus 
the relative weight in the sample of a name beginning with A, B,---, 
N, O is 3 times that of a name beginning with P, Q,---, Y, Z. The 
weighted mean is found simply as: 


3(sum for A, B,---, N, O’s) + (sum for P, Q,---, Y, Z’s) 
3(A, B,---, N, O’s in sample) + (P, Q,---, Y, Z’s in sample) 
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Finally we may wish to distribute our national sample of 480 with 
10 in each state. The analysis exactly parallels the oak leaf case, and 
we have to form the sum of 

(mean for state sample) times (population of state) 
and then to divide by the population of the U.S. 


6. Nature and properties of general probability samples 


We can carry over the use of independent interpenetrating replicates 
to the general case without difficulty. We need only remember that the 
replicates must be independent. In the oak leaf example, the replicates 
must come from groups of independently selected trees. In the Man- 
hattan telephone book example, the replicates must be based on inde- 
pendently chosen lead names; in the national sample, the replicates 
must have members in every state. In every case they must interpene- 
trate, and do this independently. 

It is clear from discussion and examples that general probability 
samples are inferior to self-weighting probability samples in two ways, 
for both simplicity of exposition and ease of analysis are decreased! If 
it were not for compensating advantages, general probability samples 
would not be used. The main advantages are: 

(1) better quality for less cost due to reduction in administrative 

costs or prelisting cost, 

(2) better quality for less cost because of better allocation of effort 

over strata, 

(3) greater possibility of making estimates for individual strata. 
All three of these advantages can be illustrated on our examples. In the 
general oak leaf example, in contrast to the first oak leaf example in 
Section 2, there is no need to determine the size (number of leaves) 
of all trees. This is a clear cost reduction, whether in money or time. 
Suppose that, in the Manhattan telephone book sample, one aim was 
an opinion study restricted to those of Polish descent. Such persons’ 
names tend to be concentrated in the second part of the alphabet, so 
that the general sample will bring out more persons of Polish descent 
and the interviewing effort will be better allocated. In the case of the 
national sample of 480, the general sample, although probably giving a 
less stable national result, does permit (rather poor) state-by-state 
estimates where the self-weighting sample would skip Nevada about 
half the time. 

It is perhaps worth mentioning at this point that, if cost is proportional 
to the total number of individuals without regard to number of strata 
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or the distribution of interviews among strata, the optimum allocation 
of interviews is proportional to the product 

(size of stratum) times (standard deviation within stratum). 
In particular, optimum allocation calls for sample strata not in propor- 
tion to population strata. If we weight appropriately, disproportionate 
samples will be better than proportionate ones—if we choose the dis- 
proportions wisely. 

In specifying the characteristics of a probability sampling at the be- 
ginning of this paper, we required that there be a sampled population, 
a population from which the sample comes and each member of which 
has a chance of entering the sample. We have not said whether or not 
this is exactly the same population as the population in which we are 
interested, the target population. In practice they are rarely the same, 
though the difference is frequently small. In human sampling, for ex- 
ample, some persons cannot be found and others refuse to answer. The 
issues involved in this difference between sampled population and 
target population are discussed at some length in Part II, and in 
chapter III-D of Appendix D in our complete report. 


7. Stratification and adjustment 


In many cases general probability samples can be thought of in 
terms of 

(1) a subdivision of the population into strata, 

(2) a self-weighting probability sample in each stratum, and 

(3) combination of the stratum sample means weighted by the size 

of the stratum. 
The general Manhattan telephone book sample can be so regarded. 
There are two strata, one made up of names beginning in A, B,---, 
N, O, and the other made up of names beginning in P, Q, -- -, Y, Z. 
Similarly the general national sample may be thought of as made up 
of 48 strata, one for each state. 

This manner of looking at general probability samples is neat, often 
helpful, and makes the entire legitimacy of unequal weighting clear in 
many cases. But it is not general. For in the general oak leaf example, 
if there were any strata they would be whole trees or parts of trees. 
And not all trees were sampled. (Still every leaf was fairly represented 
by its equal chance of affecting the weighted sample mean.) We can- 
not treat this case as one of simple stratification. 

The stratified picture is helpful, but not basic. It must fail as soon 
as there are more potential strata than sample elements, or as soon as 





24 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1954 


the number of elements entering the sample from a certain stratum is 
not a constant of the sampling plan. It usually fails sooner. There is no 
substitute for the relative chances that different individuals or sampling 
units have of entering the sample. This is the basic thing to consider, 

There is another relation of stratification to probability sampling. 
When sizes of strata are known, there is a possibility of adjustment. 
Consider taking a simple random sample of 100 adults in a tribe where 
exactly 50% of the adults were known to be males and 50% females. 
Suppose the sample had 60 males and 40 females. If we followed the 
pure probability sampling philosophy so far expounded, we should 
take the equally weighted sample mean as our estimate of the popula- 
tion average. Yet if 59 of the 60 men had herded sheep at some time 
in their lives, and none of the 40 women, we should be unwise in esti- 
mating that 59% of the tribe had herded sheep at some time in their 
lives. The adjusted mean 


50 (=) + 50 (=) = 49+] 


60 


is a far better indicator of what we have learned. 

How can adjustment fail? Under some conditions the variability of 
the adjusted mean is enough greater than that of the unadjusted mean 
to offset the decrease in bias. It may be a hard choice between adjust- 
ment and nonadjustment. 

The last example was extreme, and the unwise choice would be made 
by few. But, again, less extreme cases exist, and the unwise choice, 
whether it be to adjust or not to adjust, may be made rather easily 
(and probably has been made many times). A quantitative rule is 
needed. One is given in chapter V-C of the complete report. In the 
preceding example the relative sizes of the strata were known exactly. 
It turns out that inexact knowledge can be included in the computa- 
tion without great increase in complexity. 

An example in Kinsey’s area is cited by one critic of the Kinsey report: 

These weighted estimates do not, of course, reflect any population changes 
since 1940, which introduces some error into the statistics for the present 
total population. Moreover, on some of the very factors that Kinsey demon- 
strates to be correlated with sexual behavior, there are no Census data 
available. For example, religious membership is shown to be a factor affect- 
ing sexual behavior, but Census data are lacking and no weights are assigned. 
While the investigators interviewed members of various religious groups, 
there is no assurance that each group is proportionately represented, be- 
cause of the lack of systematic sampling controls. Thus, the proportion of 
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Jews in Kinsey’s sample would seem to be at least 13 per cent whereas their 
true proportion in the population is of the order of 4 per cent.! 


Do we know the percentage of Jews well enough to make an adjust- 
ment for it? If we can assess the stability of the “4%” figure, the pro- 
cedure of Chapter V-C will answer this question. Failing this technique, 
we could translate the question into more direct terms as follows: 
“In considering Kinsey’s results, do we want to have 13 per cent 
Jews or 4 per cent Jews in the sampled population?” and try to answer 
with the aid of general knowledge and intuition. 

We have discussed the adjustment of a simple random sample. The 
same considerations apply to the possibility of adjusting any self- 
weighting or general probability sample. No new complications arise 
when adjustment is superposed on weighting. The presence of a com- 
plication might be suspected in the case where not all segments appear 
in the sample, and we attempt to use these segments as strata. Careful 
analysis shows the absence of the complication, as may be illustrated 
by carrying our example further. 

Suppose that the sheep-herding tribe in question contains a known, 
very small percentage of adults of indeterminate sex, and that none 
have appeared in our sample. To be sure, their existence affected, albeit 
slightly, the chances of males and females entering the sample, but it 
does not affect the thinking which urged us to take the adjusted mean. 
We still want to adjust, and have only the question “Adjust for 
what?” to answer. 

If the fraction of indeterminate sex is 0.000002, and the remainder 
are half males and half females, and if our anthropological expert feels 
that about 1 in 7 of the indeterminate ones has herded sheep, we have a 
choice between 


59 0 1 
.499999 (=) + .499999 (|) + .000002 (-) 
60 0 7 


which represents adjustment for three strata, one measured subjec- 


tively, and 
59 0 
500000 (=) + .500000( —) 
60 40 


which represents adjustment for the two observed strata. 
Clearly, in this extreme example, the choice is immaterial. Clearly, 





1 Hyman, H. H. and Sheatsley, P. B. “The Kinsey report and survey methodology,” International 
Journal of Opinion and Attitude Research, Vol. 2 (1948), 184-85. 





26 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1954 


also, the estimated accuracy of the anthropologist’s judgment must 
enter. We can again use the methods of Chapter V-C. 


8. Upper semiprobability sampling 


Let us be a little more realistic about our botanist and his sample of 
oak leaves. He might have an aerial photograph, and be willing to 
select 100 trees at random. But any ladder he takes into the field is 
likely to be short, and he may not be willing to trust himself in the 
very top of the tree with lineman’s climbing irons. So the sample of 
10 leaves that he chooses from each selected tree will not be chosen at 
random. The lower leaves on the tree are more likely to be chosen than 
the highest ones. 

In the two-stage process of sampling, the first stage has been a prob- 
ability sample, but the second has not (and may even be entirely un- 
planned!). These are the characteristic features of an upper semiprob- 
ability sample. As a consequence, the sampled population agrees with 
the target population in certain large-scale characteristics, but not in 
small-scale ones and, usually, not in other large-scale characteristics. 

Thus, if in the oak leaf example we use the weights appropriate to 
different sizes of tree, as we should, the sampled population of leaves will 

(1) have the correct relative number of leaves for each tree, but 

(2) will have far too many lower leaves and far too few upper leaves. 
The large-scale characteristic of being on a particular tree is a matter 
of agreement between sampled and target populations. The large-scale 
characteristic of height in the tree (and many small-scale character- 
istics that the reader can easily set up for himself), is a matter of seri- 
ous disagreement between sampled and target populations. 

The sampled population differs from the target population within 
each segment, here a tree, although sampled population segments and 
target population segments are in exact proportion. 

If infestation varies between the bottoms and the tops of the trees, 
this type of sampling will be biased, and, while the inferences from 
sample to sampled population will be correct, they may be useless or 
misleading because of the great difference between sampled popula- 
tion and target population. 

Such dangers always exist with any kind of nonprobability sampling. 
Upper semiprobability sampling is no exception. By selecting the trees 
at random we have stultified biases due to probable selectivity be- 
tween trees, and this is good. But we have done nothing about almost 
certain selectivity between leaves on a particular tree—this may be 
all right, or very bad. It would be nive to always have probability 
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samples, and avoid these difficulties. But this may be impractical. (The 
conditions under which a nonprobability sample may reasonably 
be taken are discussed in Part ITI.) 

There is one point which needs to be stressed. The change from prob- 
ability sampling within segments (in the example, within trees) to some 
other type of sampling, perhaps even unplanned sampling, shifts a 
large and sometimes difficult part of the inference from sample to 
target population—shifts it by moving the sampled population away 
from the target population toward the sample—shifts it from the 
shoulders of the statistician to the shoulders of the subject matter 
“expert.” Those who use upper semiprobability samples, or other 
nonprobability samples, take a heavier load on themselves thereby. 

Upper semiprobability samples may be either self-weighting or gen- 
eral. The “quota samples” of the opinion pollers, where interviewers are 
supposed to meet certain quotas by age, sex, and socioeconomic status, 
are rather crude forms of upper semiprobability samples, and are often 
self-weighting. Bias within segments arises, some contribution being 
due, for example, to the different availability of different 42 year old 
women of the middle class. The sampled population may contain sexes, 
ages, and socioeconomic classes in the right ratios, but retiring persons 
are under-represented (and hermits are almost entirely absent) in 
comparison with the target population. 

Election samples of opinion, although following the same quota pat- 
tern, will ordinarily only be self-weighting within states (if we ignore 
the “who will vote” problem). Predictions are desired for individual 
states. If Nevada had a mere 100 cases in a self-weighting sample, the 
total size of a national sample would have to be about 100,000. When 
national percentages are to be compiled, it would be foolish not to 
weight each state mean in accordance with the size of the state. No one 
would favor, we believe, weighting each state equally just because 
there may be (and probably are) biases within each state. 

Disproportionate samples and unequal weights are just as natural and 
wise & part of upper semiprobability sampling as they are of prob- 
ability sampling. The difficulties of upper semiprobability sampling do 
not lie here; instead they lie in the secret and insidious biases due to 
selectivity within segments. 

Our sampling of names from the Manhattan telephone directory 
might conceivably be drawn by listing the numbers called by subscrib- 
ers on a certain exchange during a certain time, and then taking into 
the sample names from each exchange in proportion to the names listed 
for the exchange. The result would be an upper semiprobability sample 
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with substantial selectivity within the segments, which here are ex- 
changes. The nature of this selectivity would depend on the time of 
day at which the listing was made. 

Whether all segments are represented in an upper semiprobability 
sample or not, the segments may be used as strata for adjustment. The 
situation is exactly similar to that for probability sampling. The only 
difficulty worthy of note is the difficulty of assessing the stability of the 
various segment means. 

Independent interpenetrating replicate subsamples can be used to 
estimate stabilities of over-all or segment means in upper semiprobabil- 
ity samples without difficulty, if we can obtain a reasonable facsimile 
of independence in taking the different subsamples. They provide, if 
really independent, respectable bases for inference from sample to 
sampled population. We still have a nonprobability sample, however, 
and there is no reason for the sampled population to agree with the 
target population. The problem is just reduced to “What was the 
sampled population?” 

What finally is the situation with regard to bias in an upper semi- 
probability sample? We shall have a weighted mean or an adjusted one. 
In either case, any bias originally contributed by selectivity between 
segments will have been substantially removed. But, in either case, the 
contribution to bias due to selectivity within segments will remain un- 
changed. This is an unknown and hence additionally dangerous, sort of 
bias. 

The great danger in weighting or adjusting such samples is not so 
much that that weighting or adjusting may make the results worse (as 
it will from time to time) but rather that its use may cause the user to 
feel that his values are excellent because they are “weighted” or “ad- 
justed” and hence to neglect possible or likely biases within segments. 
Like all other nonprobability sample results, weighted means from 


upper semiprobability samples should be presented and interpreied 
with caution. 










9. Salvage of unplanned samples 


What can we do for such samples? We can either try to improve the 
results of their analysis, or try to inquire how good they are anyway. 
We may try to improve either actual quality, or our belief in that 
quality. The first has to be by way of manner of weighting or adjust- 
ment, the second must involve checking sample characteristics against 
population characteristics. 
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Weighting is impossible, since we cannot construct a sampling plan 
and hence cannot estimate chances of entering the sample in any other 
manner than by observing the sample itself. So all that we can do 
under this head is to adjust. We recall the salient points about adjust- 
ment, which are the same in a complete salvage operation as they are 
in any other situation: 

(1) The population is divided into segments. 

(2) Each individual in the sample can be uniquely assigned to a seg- 

ment. 

(3) The population fraction is either known with inappreciable error 

or estimated with known stability. 

(4) The procedures of Chapter V-C of Appendix C of the complete re- 

port are applied to determine whether, or how much, to adjust. 
After adjustment, what is the situation as to bias? Even worse than 
with upper semiprobability sampling, because if we do not adjust, we 
cannot escape bias by turning to weighting. In summary 

(1) whether adjusted or not, the result contains all the effects of all 

the selectivity exercised within segments, while 

(2) if adjustment is refused by the methods of Chapter V-C, we face 

additional biases resulting from selectivity between segments of 
a magnitude comparable with the difference between unadjusted 
and adjusted mean. 

This is, to put it mildly, not a good situation. 

Clearly even more caution is needed in presenting and interpreting 
the results of a salvage operation on an unplanned sample than for 
any of the other types of sample discussed previously. (If it were not 
for the psychological danger that adjustment might be regarded as 
cure, the caution required for results based on the original, unad- 
justed, unplanned sample would, however, be considerably greater.) 

Having adjusted or not as seems best, what else can we do? Only 
something to make ourselves feel better about the sample. Some other 
characteristic than that under study can sometimes be compared in 
the adjusted sample and in the population. A large difference is evi- 
dence of substantial bias within segments. Good agreement is comfort- 
ing, and strengthens the believability of the adjusted mean for the 
characteristic of interest. The amount of this strengthening depends 
very much on the a priori relation between the two characteristics. 

Some would say that an unplanned sample does not deserve adjust- 
ment, but the discussion in Part II indicates that if any sort of a sum- 
mary is to be made, it might as well, in principle, be an adjusted mean. 
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II. SYSTEMATIC ERRORS 


In order to understand how systematic errors in sampling should be 
treated, it seems both necessary and desirable to fall back on the 
analogy with the treatment of systematic errors in measurement. No 
clear account of the situation for sampling seems to be available in the 
literature, although understanding of the issues is a prerequisite to the 
critical assessment of nonprobability samples. On the other hand, one 
of physical science’s greatest and more recurrent problems is the treat- 
ment of systematic errors. 


10. The presence of systematic errors 


Almost any sort of inquiry that is general and not particular involves 
both sampling and measurement, whether its aim is to measure the 
heat conductivity of copper, the uranium content of a hill, the visual 
acuity of high school boys, the social significance of television \r the 
sexual behavior of the (white) human (U.S.) male. Further, both the 
measurement and the sampling will be imperfect in almost every case. 
We can define away either imperfection in certain cases. But the re- 
sulting appearance of perfection is usually only an illusion. 

We can define the thermal conductivity of a metal as the average 
value of the measurements made with a particular sort of apparatus, 
calibrated and operated in a specified way. If the average is properly 
specified, then there is no “systematic” error of measurement. Yet even 
the most operational of physicists would give up this definition when 
presented with a new type of apparatus, which standard physical 
theory demonstrated to be less susceptible to error. 

We can relate the result of a sampling operation to “the result that 
would have been obtained if the same persons had applied the same 
methods to the whole population.” But we want to know about the 
population and not about what we would find by certain methods. In 
almost all cases, applying the method to the “whole” population would 
miss certain persons and units. 

Recognizing the inevitability of (systematic) error in both meas- 
urement and sampling, what are we to do? Clearly, attempt to hold the 
combined effect of the systematic errors down to a reasonable value. 
What is reasonable? This must depend on the cost of further reduc- 
tion and the value of accurate results. How do we know that our sys- 
tematic errors have been reduced sufficiently? We don’t! (And neither 
does the physicist!) We use all the subject-matter knowledge, informa- 
tion and semi-information that we have—we combine it with what- 
ever internal evidence of consistency it seems worthwhile to arrange 
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for the observations to provide. The result is not foolproof. We may 
learn new things and do better later, but who expects the last words on 
any subject? 

In 1905, a physicist measuring the thermal conductivity of copper 
would have faced, unknowingly, a very small systematic error due to 
the heating of his equipment and sample by the absorption of cosmic 
rays, then unknown to physics. In early 1946, an opinion poller, study- 
ing Japanese opinion as to who won the war, would have faced a very 
small systematic error due to the neglect of the 17 Japanese holdouts, 
who were discovered later north of Saipan. These cases are entirely 
parallel. Social, biological and physical scientists all need to remember 
that they have the same problems, the main difference being the 
decimal place in which they appear. 

If we admit the presence of systematic errors in essentially every 
case, what then distinguishes good inquiry from bad? Some reasonable 
criteria would seem to be: 

(1) Reduction of exposure to systematic errors from either meas- 
urement or sampling to a level of unimportance, if possible 
and economically feasible, otherwise 

(1+) Balancing the assignment of available resources to reduction 

in systematic or variable errors in either measurement or 
sampling reasonably well, in order to obtain a reasonable 
amount of information for the “money.” 

(2) Careful consideration of possible sources of error and careful 
examination of the numerical results. 

(3) Presentation of results and inferences in a manner which ade- 
quately points out both observed variability and conjectured 
exposure to systematic error. 

In many situations it is easy, and relatively inexpensive, to reduce 
the systematic errors in sampling to practical unimportance. This is 
done by using a probability sampling plan, where the chance that any 
individual or other primary unit shall enter the sample is known, and 
allowed for, and where adequate randomness is ensured by some 
scheme of (mechanical) randomization. The systematic errors of such 
a sample are minimal, and frequently consist of such items as: 

(a) failure of individuals or primary units to appear on the “list” 

from which selection has been made, 

(b) persons perennially “not at home” or samples “lost,” 

(c) refusals to answer or breakdowns in the measuring device. 
These are the hard core of causes of systematic error in sampling. 
Fortunately, in many situations their effect is small—there a prob- 
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ability sample will remove almost all the systematic error due to 
sampling. 


11. Should a probability sample be taken? 


But this does not mean that it is always good policy to take prob- 
ability samples. The inquirer may not be able to “afford” the cost in 
time or money for a probability sample. The opinion pollers do not 
usually afford a probability sample (instead of designating individuals 
to be interviewed by a random, mechanical process, they allow their 
interviewers to select respondents to fill “quotas”) and many have 
criticized them for this. Yet the behavior of the few probability 
samples in the 1948 election (see pp. 110-112 of The Pre-election 
Polls of 1948, Social Science Research Council Report No. 60) does 
not make it clear that the opinion pollers should spend their limited 
resources on probability samples for best results. (Shifts toward a 
probability sample have been promised, and seem likely to be wise.) 

The statement “he didn’t use a probability sample” is thus not a 
criticism which should end further discussion and doom the inquiry to 
the cellar. It is always necessary to ask two questions: 

(a) Could the inquirer afford a probability sample? 

(b) Is the exposure to systematic error from a non-probability 

sample small enough to be borne? 

If the answer is “no” to both, then the inquiry should not be, or 
have been, made—just as would be the case with a physical inquiry 
if the systematic errors of all the forms of measurement which the 
physicist could afford were unbearably large. 

If the answer is “yes” to the first question and “no” to the second, 
then the failure to use a probability sample is very serious, indeed. 

If the answer is “yes” to both, then careful consideration of the eco- 
nomic balance is required—however it should be incumbent on the 
inquirer using a nonprobability sample to show why it gave more 
information per dollar or per year. (As statisticians, we feel that the 
onus is on the user of the nonprobability sample. Offhand we know of 
no expert group who would wish to lift it from his shoulders.) 

If the answer is “no” to the first question, and “yes” to the second, 
then the appropriate reaction would seem to be “lucky man.” 

Having admitted that the sampling, as well as the measurement, 
will have some systematic errors, how then do we do our best to make 
good inferences about the subject of inquiry? Sampling and measure- 
ment being on the same footing, we have only to copy, for the sampling 
area, the procedure which is well established and relatively well under- 
stood for measurement. This procedure runs about as follows: 
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We admit the existence of systematic error—of a difference between 
the quantity measured (the measured quantity) and the quantity of 
interest (the target quantity). We ask the observations about the meas- 
ured quantity. We ask our subject matter knowledge, intuition, and 
general information about the relation between the measured quantity 
and the target quantity. 

We can repeat this nearly verbatim for sampling: 

We admit the existence of systematic error—of a difference between 
the population sampled (the sampled population) and the population 
of interest (the target population). We ask the observations about the 
sampled population. We ask our subject matter knowledge, intuition, 
and general information about the relation between sampled popula- 
tion and target population. 

Notice that the measured quantity is not the raw readings, which 
usually define a different measured quantity, but rather the adjusted 
values resulting from all the standard corrections appropriate to the 
method of measurement. (Not the actual gas volume, but the gas 
volume at standard conditions!) Similarly, the result for the sampled 
population is not the raw mean of the observations, which usually de- 
fines a different sampled population, but rather the adjusted or 
weighted mean, all corrections, weightings and the like appropriate to 
the method of sampling having been applied. Weighting a sample ap- 
propriately is no more fudging the data than is correcting a gas volume 
for barometric pressure. 

The third great virtue of probability sampling is the relative definite- 
ness of the sampled population. It is usually possible to point the 
finger at most of the groups in the target population who have no chance 
to enter the sample, who therefore were not in the sampled population; 
and to point the finger at many of the groups whose chance of entering 
the sample was less than or more than the chance allotted to them in 
the computation, who therefore were fractionally or multiply repre- 
sented in the sampled population. When a nonprobability sample is 
adjusted and weighted to the best of an expert’s ability, on the other 
hand, it may still be very difficult to say what the sampled population 
really is. (Selectivity within segments cannot be allowed for by weights 
or adjustments, but it arises to some extent in every nonprobability 
sample and alters the sampled population.) 


12. The value and conditions of adjustment 


Some would say that correcting, adjusting and weighting most non- 
probability samples is a waste of time, since you do not know, when 
this process has been completed, to what sampled population the 
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adjusted result refers. This is entirely equivalent to saying that it 
does not pay to adjust the result of a physical measurement for a 
known systematic error because there are, undoubtedly, other system- 
atic errors and some of them are likely to be in the other direction. 
Let us inquire into good practice in the measurement situation, and 
see what guidance it gives us for the sampling situation. 

‘» When will the physicist adjust the principle for the known system- 
atic error? When (i) he has the necessary information and (ii) the 
adjustment is likely to help. The necessary information includes a 
theory or empirical formula, and the necessary observations. Empirical 
formulas and observations are subject to fluctuations, so that adjust- 
ment will usually change the magnitude of fluctuations as well as alter- 
ing the systematic error. The adjustment is likely to help unless the 
supposed reduction of systematic error coincides with a substantial 
increase in fluctuations. 

If the known systematic error is so small as not to 

(1) affect the result by a meaningful amount, or 

(2) affect the result by an amount likely to be as large as, or a sub- 

stantial fraction of, the unknown systematic errors, 
then the physicist will report either the adjusted or the unadjusted 
value. If he reports the unadjusted value, he should state that the 
adjustment has been examined, and is less than such-and-so. To do 
this, either he must have calculated the adjustment or he must have 
had generally applicable and strong evidence that it is small. 

In any event, his main care, which he will not always take, must be 
to warn the reader about the dangers of further systematic errors, 
perhaps, in some cases, even by saying bluntly that “the adjusted 
value isn’t much better than the raw value,” and then provide raw 
values for those who wish to adjust their own. 

If the physicist is aware of systematic errors of serious magnitude 
and has no basis for adjustment, his practice is to name the measured 
quantity something, like Brinnell hardness, Charpy impact strength, 
or if he is a chemist—iodine value, heavy metals as Pb, etc. By analogy, 
those who feel that the combination of recall and interview technique 
make Kinsey’s results subject to great systematic error might well 
define “KPM sexual behavior” as a standard term,? and work with this. 

By analogy then, when should a nonprobability sample be adjusted 
in principle? (Most probability samples are made to be weighted any- 
way—this is part of the design and must be carried out.) When (i) 





2 The letters KPM stand for Kinsey, Pomeroy and Martin, the authors of Serual Behavior in the 
Human Male. 
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we have the necessary information and (ii) when the adjustment is 
likely to help. The necessary information will usually consist of facts 
or estimates of the true fractions in the population of the various seg- 
ments. 

When is the adjustment likely to help? This problem has usually 
been a ticklish point requiring technical knowledge and intuition. A 
quantitative solution is now given in Chapter V-C of Appendix C in 
the complete report. With this as a guide, it should be possible to make 
reasonable decisions about the helpfulness of adjustment. 

If the decision is to adjust, we should accept the sampled population 
corresponding to the adjusted mean, and calculate the adjustment. 
We then report the adjusted value, unless the adjustment is small, 
when we may report the unadjusted value with the statement that 
the adjustment alters it by less than such-and-so. 

Our main care, which we may not always take, must be to warn the 
reader about the dangers of further lack of representativeness, perhaps, 
in some cases, even by saying bluntly that “the adjusted mean isn’t 
much better than the raw mean, even if we took 20 pages to tell you 
how we did it and six months to do it,” and to provide raw means for 
those who wish to adjust their own. 

If we were prepared to report an unadjusted mean, we were clearly 
inviting inference to some sampled population. Adjustment will give 
us a sampled population that is usually nearer to the target population. 
Hence we should adjust. 

If we cannot adjust, and must present raw data which we feel 
badly needs adjustment, we may say that this is what we found in 
these cases—take ’em or leave ’em. Except from the point of view of 
protecting the reader from over-belief in the results, this would seem 
to be a counsel of despair. By analogy with the physicist, it seems better 
to introduce “KPM sexual behavior” and its analogs in such situations. 





DO PERSONS LOST TO LONG TERM OBSERVATION 
HAVE THE SAME EXPERIENCE AS 
PERSONS OBSERVED? 


EVALUATION OF ANTISYPHILITIC THERAPY* 
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ALBERT P. IsKRANT, AND QUENTIN R. ReMEINt 


EDICAL research has long been faced with the problem of patients 

lapsing from observation during the evaluation of treatment. To 
date methods of analysis of results of therapy have assumed that the 
patients who lapse from observation would have had the same experi- 
ence as those who remained under observation. To the extent that this 
assumption is not correct the calculation of the results of therapy is in 
error. It has been claimed that the “observed” is weighted in favor of 
failures as relapses would return for more treatment while those who 
are getting along all right would not bother to return for posttreat- 
ment observation. On the other hand it has been stated that the failures 
go elsewhere for retreatment instead of to the source of the original 
treatment and thus bias a study in favor of “good” results. 

In an effort to test the hypothesis underlying current evaluation of 
treatment (viz., the “not observed” would have had the same experi- 
ence as the observed) a special study in the evaluation of therapy for 
syphilis, the Blue Star Research Study, was initiated by the Division of 
Venereal Disease, U.S. Public Health Service, in which an attempt was 
made to hold a group of patients to 100 per cent follow-up and com- 
pare the results of therapy with those obtained when no intensive 
follow-up effort was made [1]. These patients included 560 persons 
with secondary syphilis confirmed by darkfield examination who had 
had no previous antisyphilitic therapy of any kind. The follow-up ef- 
fort was over 90 per cent effective over a period of two years, largely 
due to the work of the physicians in the cooperating treatment facili- 
ties and specially trained research investigators whose sole functions 





* Presented before the session on Statistical Evaluation of Clinical Data. American Statistical 
Association Annual Meeting, December 28, 1951. 

t Dr. Theodore J. Bauer formerly Chief, Division of Venereal Disease, U.S. Public Health Szrvice, 
represents the physicians and nurses who, over the years, devoted their diligent efforts to the treatment 
and observation of Blue Star Research Study patients. Mr. Donohue, Principal Statistician, and Mr. 
Larsen, Health Program Representative, represent the research investigators whose tireless and careful 
work in selecting and holding patients to observation, have made possible the collection of the data 
utilized in this paper. Mr. Iskrant, formerly Principal Statistician of the Division of Venereal Disease, 
and Mr. Remein, Statistician, Division of Venereal Disease, prepared this statistical analysis. 
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have been to assist the physicians in selecting patients and to see that 
these patients are held to follow-up observation. 

The determination of response to therapy in syphilis requires ob- 
servation of the patient for a long period of time following treatment. 
In the primary or secondary stage, relapse following treatment gen- 
erally occurs within two years, if at all, although it may take as long 
as twenty years or more to determine whether late, disabling effects 
occur (such as paresis, tabes dorsalis, and aneurysm). This study is 
limited to a two-year evaluation of response to therapy for secondary 
syphilis. The presence of relapse is determined by physical examination, 
darkfield microscopy of sera from lesions, and blood and spinal fluid 
serologic tests. Results of the blood test are usually reported quantita- 
tively in titer units based on successive twofold dilutions of the pa- 
tient’s serum. When treatment is successful, the titer gradually de- 
clines until negativity is reached a number of months after treatment. 
After the initial schedule of treatment is completed, no further therapy 
is administered unless lesions reappear, unless a serologic relapse evi- 
denced »y a sustained rise in serologic titer occurs, or a high titer (32 
Kahn units or more) is sustained for one year or more following ther- 
apy [1]. Death from syphilis rarely occurs in the first few years after 
onset. 

The criteria (other than diagnostic) taken into consideration in the 
selection of patients for the Blue Star Research Study are willingness of 
the patiert to cooperate in long term follow-up, good general health, 
residential stability, and, in a measure, personal stability (i.e., alco- 
holics, drug addicts, etc. were excluded if known). It was felt that these 
characteristics would not affect the relapse rate to an appreciable ex- 
tent. In spite of screening, however, some such unstable persons were 
included in the study because their characteristics were not known at 
the time of selection. 

In the following pages the experience of these intensively followed 
patients is used in two different methods to test the validity of the 
hypothesis that patients who lapse from observation would have had 
the same experience as those who remained under observation, first, by 
analysis of the results for patients within the special study and second, 
by comparison of results of special study patients with results for an- 
other group of patients who were followed less intensively. 


METHOD I 


This analysis is limited to patients treated for secondary syphilis 
who were selected for the special intensive follow-up study. Three 
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schedules of treatment, all utilizing crystalline penicillin G are included. 

1. A total of 2,800,000 units of aqueous penicillin alone—25,000 units 
administered every three hours for fourteen days. 

2. A total of 2,800,000 units of aqueous penicillin with conjunctive 
arsenoxide and bismuth—25,000 units of penicillin administered 
every three hours for 14 days and a total of 4 to 6 mg. of arsen- 
oxide per kilogram of body weight (or a total of 300 mg. in persons 
weighing over 60 kg.) and 600 mg. of bismuth. 

3. A total of 3,400,000 units of aqueous penicillin—40,000 units ad- 
ministered every two hours for seven days. 

Almost all of the patients treated on these schedules have had the op- 
portunity to be observed for at least two years, and sufficient cases for 
evaluation have been accumulated. Patients were observed monthly for 
the first year and quarterly during the second year. 

Follow-up of the patients was secured by the research investigator 
stationed at each cooperating facility. A variety of follow-up tech- 
niques was used by each investigator including letters, telegrams, tele- 
phone calls, visits to patients, etc. The cooperation given by public 
clinics and physicians in many parts of the country made follow-up 
continuity possible for many patients who moved beyond commuting 
distance from the treating facility. Not all patients proved to be co- 
operative. These cases would have been lost to follow-up had it not 
been for the diligence of the investigators. The success of follow-up is 
indicated in Table 1 where it can be seen that approximately 95 per 


TABLE 1 
SPECIAL STUDY IN TREATMENT OF SECONDARY SYPHILIS 
POSTTREATMENT OBSERVATION SUCCESS, BY 
TREATMENT SCHEDULE 








2,800,000 units 2,800 000 units 3,400 000 units 
Penicillin Alone  Pen., Ars. & Bis. _ Penicillin Alone 





Per 
Cent 


Per Per 


Number Cent Number Cent 


Number 





Under Observation 95.8 92.7 147 05.5 
Retreated before 2 years 30 16 
2 years or more since treatment 124 
Less than 2 years since treatment 9 7 

Lost to Observation A y 7 4.5 
Lapsed 7 
Died (not assoc. with syphilis) 0 

Total number treated 100.0 A" 154 100.0 





cent of the patients under study were observed for two years or until 
retreatment, or are still under observation at this writing (about 4 
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years since the study began). No significant differences among the 
three schedules in the percentage of patients under observation were 
noted. (The probability of getting differences giving a chi-square 
greater than that obtained through chance alone is slightly more than 
6 out of 10.) 

In consultation with research investigators using detailed records on 
interviews with patients and on the amount of effort required to keep 
patients under observation, the patients have been classified according 
to whether or not they would probably have been lost to routine meth- 
ods of follow-up. Patients who missed two or more appointments with- 
out a good reason; those who frequently required special attention in 
obtaining observations such as home visits, provision for transportation 
to the facility even when no observations were actually missed; and 
those who moved out of the area served by the facility—all these were 
considered as becoming lost to routine follow-up during the two year 
observation period. For each patient who would have been lost to 
routine follow-up, the posttreatment observation period in which he 
would have lapsed was estimated as closely as possible. 

It was our original intention to divide the research cases into two 
groups, the cooperative and the uncooperative, and compare the cumu- 
lative retreatment rates in both groups. Unfortunately there was no 
way of determining which patients would be cooperative and which 
uncooperative except by observing the patient’s behavior. Obviously 
the longer the time over which the patient was observed the greater 
the opportunity for showing uncooperativeness. Therefore, those pa- 
tients who were retreated in the early posttreatment months did not 
have a chance to become uncooperative and hence in the early months 
the presumably cooperative patients showed higher retreatment rates. 

We therefore decided to make ovr comparison on a more realistic 
basis by comparing the results of all patients in the study with the 
results that would have been obtained with the same cases if they had 
not had this concentrated follow-up. This would in essence amount to 
comparing the retreatment rate of patients without intensive follow-up 
to the retreatment rate of the same patients with intensive follow-up 
and would resemble the findings in actual practice. 

The method of calculation of the retreatment rates is that used by 
the Division of Venereal Disease and previously described by several of 
us [3, 4]. Briefly, in this method the retreatment rate is calculated by 
making appropriate adjustment for the loss of patients from observa- 
tion. The total used for computing retreatment, seropositivity, and 
seronegativity rates is adjusted by including the same proportion of 
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retreated as non-retreated patients remaining under observation. Then 
if a, persons are observed in period n or later of whom c, persons are 
retreated in period n, denoting the adjusted total cases by ep, 


Arne (n—1) 





e, = 
Qin—1) — C(n—1) 


The retreatment rate in period n is 100 Xc,/en, and the cumulative 
retreatment rate is the sum of the retreatment rates for all individual 
observation periods through n. The per cent seropositive or seronega- 
tive in period n is simply the number seropositive or seronegative di- 
vided by the adjusted total cases times 100. 

The retreatment rate obtained by this method is the same as that ob- 
tained by the method commonly referred to as “the life table method.” 
The seronegativity and seropositivity rates differ somewhat since “the 
life table method” cumulates sustained seronegativity rates whereas 
the method described here computes the seronegativity rate for each 
particular period based only on cases observed in that period or later. 

The following two groups were tabulated and analyzed separately: 

A—All patients in the study (i.e., intensive follow-up)—analysis of 

the results of therapy including all posttreatment observations 
through two years of follow-up on patients in the study. 

B—Same patients with “routine” follow-up—analysis of the results 

of therapy for the same patients including only those observa- 
tions on the uncooperative patients prior to the posttreatment 
period in which they would most likely have lapsed. 

A couple of examples should help make clear precisely which ob- 
servations were included in group A and which, in group B. Patient 
L.M. was treated for secondary syphilis and observed for six months. 
At the end of this time he moved to another city 500 miles away from 
the treatment center. Arrangements were made for the patient to be 
observed by the local clinic in the new city of residence, and follow-up 
continued for the complete two year period. In group A the results of 
all observations on this patient are included for the two years. In 
group B observations on this patient are included for only six months; 
that is, until he moved. 

Or take another case, F. G., who was treated for secondary syphilis 
and was observed for 9 months following treatment. He did not appear 
for his tenth month examination. When the investigator found the pa- 
tient and interviewed him, the patient indicated that he believed him- 
self to be cured and had decided to stop coming in for examinations. 
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The patient did, however, agree to accompany the investigator to the 
clinic for an examination and serologic test. For the remainder of the 
two years, the investigator had to find the patient and bring him to the 
clinic for each examination. In group A all observations of this patient 
are included for the two years. In group B only the observations for 
nine months are included. 

Tables 2, 3, and 4 and Figure 1 show by treatment schedule the com- 
parative data on the results of therapy for the same patients for all 
observations and for those which would ordinarily be made without in- 
tensive follow-up. The greatest difference between the cumulative re- 
treatment rates for both groups amounted to only 1.6 per cent at the 
seventh month on the 3,400,000 units of penicillin schedule. At no point 
are there significant differences! in rates between the two follow-up 
conditions (at the level of P=.95). At 21-24 months one schedule 
shows no difference in retreatment rates, and the others show differences 
in opposite directions. The largest difference was 1.1% in the 3,400,000 
unit schedule. The probability of getting larger differences than this 
by chance alone is nearly one half. 





1 In testing for significance the intensively followed group has been considered the population con- 
sisting of N treated patients. In the ith observation period since treatment, s; patients were retreated 
representing pj proportion retreated. In the sample obtained through routine follow-up, primes are 
used to denote the estimates. To determine whether the cumulative proportion retreated over r ob- 
servation periods in the sample differs significantly from the cumulative of the parameter over these 
same intervals, the following formulas are applied: 





T T 
Dati — Dow - BE) 


tol fol 





E(N — 1) 
where E, the effective sample size as adapted from Cornfield (2), is defined as the equivalent sample 


size in which there are no losses from observation but which yields the same proportion retreated and 
the same number of retreated cases as were observed; i.e., 


The t-test is then applied as follows: 
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TABLE 2 


COMPARISON OF RESULTS OF THERAPY EVALUATION FOR 
INTENSIVE AND “ROUTINE” FOLLOW-UP 


SECONDARY SYPHILIS—CUMULATIVE RETREATMENT RATES 


TREATMENT SCHEDULE: Penicillin—2,800,000 units—25,000 
every 3 hours “Crystalline G” (14 days) No arsenoxide, no bismuth 


A—Intensive Follow-up 








Not re-treated 





Observa- Retreated Cases 

tion Seropositive Seronegative 

Period 

(months) Number Per Cent yey g Number Per Cent Number Per Cent 
‘° 








166 
144 
118 


-1 


onl 
iol 


2-3 
3-4 
4-5 
5-6 
6-7 
7-8 
8-9 
9-10 
10-11 
11-12 
12-15 
15-18 
18-21 
21-24 
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B—Routine Follow-up 





Not re-treated 


, Retreated Cases ’ 
7 od Seropositive Seronegative — 
(months) Cases* 


Number Per Cent Cumula- Number Per Cent Number Per Cent 
tive % 











164 
135 
109 

79 


— 
w 


6-7 
7-8 
&-9 
9-10 
10-11 
11-12 
12-15 
15-18 
18-21 
21-24 


* Adjusted total cases in each period includes the number of cases observed in this period or later 
and cases retreated in previous periods adjusted for losses from observation. 
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TABLE 3 


COMPARISON OF RESULTS OF THERAPY EVALUATION FOR 
INTENSIVE AND “ROUTINE” FOLLOW-UP 


SECONDARY SYPHILIS—CUMULATIVE RETREATMENT RATES 


TREATMENT SCHEDULE: Penicillin—2,800,000 units—25,000 

every 3 hours “Crystalline G” 4-6 mg. arsenoxide per kg. of body 

weight (or total of 300 mg. in persons weighing over 60 kg.) and 600 
mg. of bismuth (14 ey 


A—Intensive Follow-up 








Observa- Retreated Cases 
tion 
Period 
(months) Number Per Cent 








Cumula- 
tive % 





-l 


oe e e CO 


“ammo 


6-7 
7-8 
8-9 
9-10 
10-11 
11-12 
12-15 
15-18 
18-21 
21-24 
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B—Routine Follow-up 





Not re-retreated 
Observa- Retreated Cases 
tion Seropositive Seronegative — 


Period Cases* 


(months) Number Per Cent ~~ Number PerCent Number Per Cent 


Cu 
tibe 











-1 
1-2 
2-3 
3-4 
4-5 
5-6 
6-7 
7-8 
&-9 
9-10 

10-11 
11-12 
12-15 
15-18 
18-21 
21-24 


187 
161 
122 
85 
57 
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A—Intensive Follow-up 





TABLE 4 


COMPARISON OF RESULTS OF THERAPY EVALUATION FOR 
INTENSIVE AND “ROUTINE” FOLLOW-UP 


SECONDARY SYPHILIS—CUMULATIVE RETREATMENT RATES 


TREATMENT SCHEDULE: Penicillin—3,400,000 units—40,000 
every 2 hours “Crystalline G” (7 days), No arsenoxide, no bismuth 



































. Not re-treated 
Observa- Retreated Cases Adjusted 
tion Seropositive Seronegative Total 
Period Cases* 
(months) Number Per Cent SS Number Per Cent Number Per Cent 
-1 —_ — _ 153 99.4 1 0.6 154 
1-2 1 0.7 0.7 145 94.8 7 4.6 153 
2-3 — —- 0.7 122 79.7 30 19.6 153 
3-4 3 2.0 2.7 98 64.1 51 33.3 153 
4-5 3 2.0 4.7 79 51.6 67 43.8 153 
5-6 = — 4.7 71 46.4 75 49.0 153 
6-7 4 2.6 7.3 59 38.6 83 54.2 153 
7-8 1 0.7 8.0 51 33.3 90 58.8 153 
8-9 2 1.3 9.3 39 25.7 99 65.2 152 
9-10 — — 9.3 32 21.1 106 69.8 152 
10-11 = os 9.3 29 19.2 108 71.6 151 
11-12 — _— 9.3 24 15.9 113 74.9 151 
12-15 1 0.7 10.0 19 12.8 115 77.4 149 
15-18 _— _ 10.0 19 12.8 115 77.4 149 
18-21 1 0.7 10.7 14 9.5 118 80.0 148 
21-24 —= — 10.7 9 6.5 115 83.0 139 
B—Routine Follow-up 
Not re-treated 
Observa- Retreated Cases Adjusted 
tion Seropositive Seronegative T 
‘ otal 
Period Cases* 
(months) Number Per Cent yosoyg Number Per Cent Number Per Cent 
-1 — — —_ 147 99.3 1 0.7 148 
1-2 1 0.7 0.7 133 95.0 6 4.3 140 
2-3 — _ 0.7 103 78.7 27 20.6 131 
3-4 2 1.6 2.3 78 63.5 42 34.2 123 
4-5 2 ee 4.0 61 51.4 53 44.6 119 
5-6 — _ 4.0 55 47.1 57 48.8 117 
6-7 2 Lf 5.7 47 41.0 61 53.2 115 
7-8 1 0.9 6.6 39 35.0 65 58.3 111 
8-9 2 1.9 8.5 29 27.6 67 63.8 105 
9-10 a — 8.5 24 23.1 71 68.3 104 
10-11 os — 8.5 20 19.9 72 71.6 101 
11-12 —_ — 8.5 14 14.1 77 77.4 100 
12-15 a — 8.5 12 12.6 75 78.8 95 
15-18 _— _ 8.5 12 12.8 74 78.7 94 
18-21 1 1.1 9.6 8 8.6 76 81.7 93 
21-24 a = 9.6 5 5.6 75 84.7 89 
* Adjusted total cases in each period includes the number of cases observed in this period or later 
and cases retreated in previous periods adjusted for losses from observation. 
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Schedule: 2,800,000u Aqueous Crystalline Penicillin G(25,000 q 3 hrs) 


adh 











| | | 


0 3 6 9 12 15 18 21 24 
Posttreatment observation (months) 





Schedule: 2,800,000u Aqueous Crystalline Penicillin G (25,000 q 3 hrs) 
16 -- plus Arsenoxide and Bismuth 
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Fig. 1. Results of therapy evaluation—intensive and routine follow-up—sec- 
ondary syphilis—cumulative retreatment rates. 
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If the intensively followed group is taken as the population and the 
routinely followed group as a sample, from that population, the re- 
mainder of the population consists of those patients who would have 
been lost to observation. Using the “effective-sample-size” notion [2], 
this situation closely approximates a condition where two samples are 
drawn from a population without replacement so that n,+-n2=N. It 
can be readily demonstrated that, where m+m=N, 


(kok! Rv - kB - ky 


oz,’ F2,' 2-4’ 





) 


that is, the t-value of the difference between two sample means where 
m+n2=N is the same as the t-value of the difference between either 
sample mean and the parameter. Applied to the present problem, this 
indicates that there are no significant differences between the results 
for persons remaining under observation and for those lost to observa- 
tion since no significant differences were noted between the routinely 
followed and the intensively followed patients. 

Table 5 shows the percentage of total cases followed for two years, 
the lowest limit being about 50 per cent followed for two years. Then 
with at least 50 per cent follow-up, patients lost to observation in the 
evaluation of treatment for syphilis, very likely have the same experi- 
ence as patients remaining under observation. 


TABLE 5 


POSTTREATMENT OBSERVATION UNDER INTENSIVE AND 
ROUTINE CONDITIONS OF FOLLOW-UP 


Special Study in Treatment of Secondary Syphilis 








A—Intensive Follow-up B—Routine Follow-up 





Treatment Schedule Total Followed 21-24 mos. Total Followed 21-24 mos. 
cases cases 
treated Number Per Cent treated Number Per Cent 














2,800,000 units Penicillin Alone 167 148 88.6 167 107 64.1 

2,800,000 units Pen. with Ars. & Bis. 193 175 90.7 193 99 51.3 

3,400,000 units Penicillin Alone 154 139 90.3 154 89 57.8 
METHOD 2 


In this method we have compared the results of treatment with 
penicillin in a group of patients with “routine” follow-up with the re- 
sults among patients in the intensively followed group previously dis- 
cussed. Note that in Method 1 we compared patients followed rou- 
tinely with the same patients followed intensively; in Method 2 the 
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groups are mutually exclusive. The Division of Venereal Disease re- 
ceives records from many hospitals and clinics on treatment and fol- 
low-up of patients treated for early syphilis. Many schedules of treat- 
ment are included in these records which are analyzed periodically for 
comparative evaluation. The results of these evaluations are published 
by the Division for program guidance. The follow-up on these patients 
is not as intensive as in the special study. All patients are advised of 
the importance of posttreatment follow-up before they are discharged 
from the treating facility. Form letters are sent to all patients remind- 
ing them when examinations are due, and occasionally, visits are made 
to the patients by a representative of the health department. In this 
routine method follow-up is conducted entirely on an impersonal basis, 
and there are many lapses from observation. In order to test the valid- 
ity of a statistical evaluation with incomplete follow-up, a group of pa- 
tients treated with crystalline penicillin G from the “routine” evalua- 
tion was compared with the patients treated with crystalline penicillin 
G in the intensive follow-up group. Both groups of patients were treated 
in the same clinics and during the same time periods (July 1946- 
December 1948). 

Among patients with intensive follow-up the amount of penicillin 
ranged from 2,800,000 units to 4,200,000 units for an average of 
3,130,000 units.? Among the patients with routine follow-up total dos- 
age ranged from 2,400,000 units to 4,800,000 units for an average of 
3,369,000 units.? Available evidence indicates that a difference of 
239,000 units in an average penicillin dosage of over 3,000,000 units 
would make very little difference in the retreatment rates in the two 
groups inasmuch as the dosage-response curve has very little slope at 
3,000,000 units and above. Therefore, the retreatment rates for the 
two groups can be expected to be approximately equal. 

The intensive follow-up group includes 253 cases with 92.1 per cent 
observed for two years (or until retreated). The routine follow-up group 
had 1,864 patients treated of which 41.7 per cent were observed for two 
years (or until retreated). Results of therapy through 24 months are 
presented in Table 6. At the 24th month the cumulative retreatment 
rates are practically identical. Figure 2 presents a graphic comparison 
of the results. It can be concluded that the intensively followed cases 
were retreated earlier than those routinely followed, as evidenced by 
the slightly higher retreatment rate in the intensively followed group 
throughout the first year of observation. Also cases with intensive 





2 The geometric mean was used in this calculation because the dosage-response curve is such that 
arithmetic changes in the retreatment rate are associated with geometric changes in dosage. 
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follow-up are recorded as having reversed to negative more rapidly in 
the first six months than did cases routinely followed. Differences are 
negligible after the first six months of posttreatment observation. The 
fact that intensively followed cases were observed more frequently 
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Fic. 2. Comparison of results of treatment with crystalline penicillin G 
of secondary syphilis in a group with intensive follow-up and in a group with 
routine follow-up. 


than those under routine follow-up would account for the earlier detec- 
tion of cases requiring retreatment among those intensively followed. 
This would probably also account for differences in the rapidity of re- 
versal to seronegative. In spite of earlier retreatment and seronegativ- 
ity rates, however the cumulative results from the 12th month on are 
almost identical for both groups. 


CONCLUSION AND DISCUSSION 


Two methods have been used to present evidence concerning the 
assumption that persons lost to observation have the same experience 
as persons observed in the evaluation of treatment for syphilis. In the 
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TABLE 6 ° 


RESULTS OF TREATMENT OF SECONDARY SYPHILIS WITH 
CRYSTALLINE PENICILLIN G IN A GROUP WITH IN- 
TENSIVE FOLLOW-UP AND IN A GROUP WITH 
“ROUTINE” FOLLOW-UP 


'~ @& @& & 


Intensive Follow-up Group 






































Not re-treated 
Observa- Retreated Cases 
tion Seropositive Seronegative Adjusted 
Period Total 
(months) Number Per Cent oe Number Per Cent Number Per Cent Casee® 
-1 1 0.4 0.4 250 98.8 2 0.8 253 
1-2 3 1.2 1.6 224 89.6 22 8.8 250 
2-3 _ —_ 1.6 182 72.8 64 25.6 250 
3-4 2 0.8 2.4 137 55.0 106 42.6 249 
4-5 4 1.6 4.0 1ll 44.8 127 51.2 248 
5-6 6 2.4 6.4 89 36.0 142 57.5 247 
6-7 5 2.0 8.4 76 30.9 149 60.6 246 
7-8 1 0.4 8.8 69 28.1 155 63.0 246 
8-9 5 2.1 10.9 54 22.2 163 66.9 244 
9-10 _ _ 10.9 46 19.1 169 70.0 241 
10-11 1 0.4 11.3 40 16.6 174 72.1 241 
11-12 1 0.4 44.7 34 14.1 179 74.2 241 
12-15 3 1.2 12.9 26 10.8 183 76.2 240 
15-18 2 0.8 13.7 23 9.6 184 76.6 240 
18-21 3 1.3 15.0 19 7.9 184 77.0 239 
21-24 — —_ 15.0 11 4.7 187 80.2 233 
Routine Follow-up Group 
Not re-treated 
Observa- Retreated Cases —_ 
tion Seropositive Seronegative Adjusted 
Period Total 
(months) Number Per Cent — Number Per Cent Number Per Cent Cases* 
-1 1 0.1 0.1 1,856 99.6 7 0.4 1,864 
1-2 4 0.2 0.3 1,765 96.4 60 3.3 1,830 
2-3 7 0.4 0.7 1,526 85.2 254 14.2 1,792 
3-4 13 0.7 1.4 1,188 68.0 534 30.6 1,747 
4-5 21 1.2 2.6 940 55.5 710 41.9 1,695 
5-6 37 2.2 4.8 667 40.5 899 54.6 1,647 
6-7 14 0.9 5.7 523 32.9 977 61.4 1,592 
7-8 15 1.0 6.7 421 37.2 1,024 66.1 1,549 
8-9 18 2 7.9 328 21.8 1,056 70.2 1,503 
9-10 13 0.9 8.8 277 18.9 1,060 72.3 1,466 
10-11 16 pe 9.9 238 16.7 1,042 73.3 1,421 
11-12 17 1.2 11.1 205 15.0 1,012 73.8 1,370 
12-15 28 2.8 13.2 144 11.0 986 75.6 1,304 
15-18 8 0.7 13.9 86 7.8 860 78.1 1,101 
18-21 8 0.9 14.8 54 5.7 746 79.3 940 
21-24 8 1.0 15.8 35 4.5 619 79.6 778 





* Adjusted total cases in each period includes the number of cases observed in this period or later 
and cases retreated in previous periods adjusted for losses from observation. 
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first method, therapy results in a group of intensively followed patients 
were compared with the results which would have occurred among the 
same patients if intensive methods had not been used. In the second 
method therapy results in two mutually exclusive groups were com- 
pared. One group consisted of the previously mentioned patients who 
were followed intensively, and the other group consisted of routinely 
followed patients in the same treatment centers. No significant differ- 
ences in retreatment rates were observed by either method, and it is 
our conclusion that in these series with at least 42 per cent complete 
follow-up at two years after treatment patients lost to observation had 
the same experience as those who remained under observation. 

While these findings are of considerable value in the evaluation of 
therapy for syphilis, there is no evidence for their application to the 
study of therapy response in other diseases. For instance, where death 
from a disease frequently occurs after treatment, it is not at all likely 
that persons lost to observation would have the same experience as 
those observed. 
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APPLICATIONS OF STATISTICAL METHODS TO 
SEDIMENTARY ROCKS* 


W. C. KrumsBein 
Northwestern University 


Statistical methods find wide application in geology, es- 
pecially in the study of textures and composition of sedimen- 
tary rocks. Certain apparent irregularities in the data, such as 
highly skewed distributions, use of weight instead of number 
frequencies, use of unequal class intervals, and some others, 
required development of special methods of statistical analy- 
sis. In part, logarithmic transformations permitted applica- 
tion of conventional methods to the data. Some sedimentary 
attributes approach Gaussian distributions with no complicat- 
ing factors. Mineral composition data are commonly bi- 
nomial or Poisson distributions. Analysis of variance and ex- 
perimental design are becoming increasingly important in 
further analysis of geological data. 


INTRODUCTION 


EVERAL circumstances controlled the development of statistical 
thinking in geology. The first fields opened to statistical analysis, 
some 50 years ago, concerned data which did not seem to lend them- 


selves to then current methods of analysis. Mineral composition of 
sediments, for example, yielded discrete rather than continuous dis- 
tributions; and size frequency distributions of sediments, although 
continuous, required use of unequal class intervals and were commonly 
highly skewed. Moreover, a single sand sample may contain millions 
of grains, so that weight percentage instead of number of grains was 
more convenient for expressing frequency. Contemporary textbooks on 
statistics said little or nothing about handling such irregular data. As 
a result, techniques adapted to these special needs were developed di- 
rectly by geologists. 

Histograms of sand analyses were used before 1900, and since about 
1925 logarithmic forms of histograms and cumulative curves have been 
in common use. The median and quartile deviation came into use about 
1930, and the median is still the most popular average for reporting 
sedimentary grain size data. In part, the inertia of established pro- 
cedure and the large amount of data published as median and quartile 
summaries have operated against use of more efficient statistics in ex- 
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pressing size analysis data. As long ago as 1935, Eisenhart [10] applied 
the Chi-square test to sampling problems in sedimentation, and numer- 
ous other workers, mentioned below, helped make available methods 
of wider and more general applicability than those initially introduced, 
Analysis of variance techniques have been used sporadically for about 
a decade, and in 1946 Swineford and Swineford [33] published a com- 
prehensive study based on a three factor analysis of variance model. 

Other fields of geology, notably paleontology and geomorphology, 
not faced initially with discrete or logarithmic distributions, adopted 
standard methods of analysis from the start. Applications in these fields 
have been expanded since the 1930’s, and the process was accelerated 
by publication of Simpson and Roe’s Quantitative Zoology in 1939 [32]. 
In 1948 and 1949 analysis of variance and multivariate analysis were 
applied in paleontology by Burma [3] and Miller [22]. In 1950 Strahler 
[34] studied relations between samples and populations of surface slopes 
in geomorphologic analysis. 

Although modern statistical methods are being used to some extent 
in geology, the general state of statistical knowledge among geologists 
is rather unsatisfactory. Students seldom have courses in the subject, 
and some geology teachers perhaps tend to emphasize graphic pro- 
cedures with little attention to underlying theory. It seems fair to 
state, however, that there is an expanding interest in the subject and a 
corresponding increase of appreciation of what statistics can and can- 
not do. 


PROPERTIES OF SEDIMENTARY ROCKS 


Inasmuch as this paper is addressed to an audience of statisticians, 
it seems appropriate to define the scope of the present subject. Sedi- 
mentary rocks are deposits of solid materials on the earth’s surface 
produced by mechanical, chemical, or biological agencies in any me- 
dium (air, water, glacial ice) under normal conditions of the surface. 
All sedimentary rocks have attributes of composition, texture, and 
structure. Composition refers to mineralogical or chemical make-up of 
the rock; texture refers to characteristics of the grains or particles and 
the grain-to-grain relations among them; and structure refers to larger 
features of the deposit, such as stratification, geometrical attitude of 
the strata, and included organic remains. 

Textural and compositional properties of sediments have been 
studied in more detail by statistical methods than have sedimentary 
structures, although many geological field studies involve the statistics 
of structures. Grain orientation, directions of cross-bedding, and atti- 
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tude of rock fractures have received some statistical treatment by 
Reiche [30]; Chayes [5]; and Pincus [28]. 

Inasmuch as textural and compositional features lend themselves 
well to laboratory study, a vast amount of statistical data has accumu- 
lated on these properties. Table 1 defines some textural and composi- 
tional properties of sediments, indicates those which have been quanti- 
fied, and suggests the nature of the distributions obtained in each. 
Most work has been done on particle size distribution and mineral com- 


TABLE 1 


STATISTICAL ASPECTS OF SEDIMENTARY TEXTURES, 
COMPOSITION, AND MASS PROPERTIES 








Property 


Frequency Distributions* 





Within single 
samples 


Among closely 
spaced samples 





Particle Size 


Expressed as sieve mesh,’ 


intercepte, or in terms of 
settling velocity. 


Log normal 


Log mean is normally dis- 
tributed. 





Particle 
Sphericity 


Cube root of ratio between 
particle volume and vol- 
ume of circumscribing 
sphere. 


Mean aphericity is nor- 
mally distributed. 





Particle 
Roundness 


Ratio of average radii of 
edges to radius of circle in- 
scribed in maximum pro- 
jection plane, 


Mean roundness is nor- 
mally distributed. 





Particle Surface 
Texture 


Minute surface irregulari- 
ties on particles. Defini- 
tions not quantified. 


Percentage of frosted 
grains is normally dis- 
tributed. 





Particle 
Orientation 


Orientation of particle axes 
or planes in space. 


Normal or circular 
normal 


Mean orientation is nor- 
mally distributed. 





Mineral 
Composition 


Percentage composition of 
minerals present. 


Discrete distributions. 


Binomial and Poisson(?) 
distributions. 


Percentages of some min- 
erals are normally dis- 
tributed. 








Porosity 


Percentage of pore space 
in aggregate. 


Normal 


Normally distributed. 





Permeability 


Measure of ease of fluid 
flow through aggregates. 


Log normal(?) 


Log normal(?) distribu- 
tion. 





Natural 
Moisture 
Content 





Percentage of moisture in 
freshly collected samples. 





Normal 





Normally distributed. 





* Exceptions to the generalisations in these columns occur, but available data suggest that most 
distributions approach normalcy or log normalcy in their behavior. 


+ 
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position. Particle shape (sphericity and roundness) has been fairly ex- 
tensively studied, whereas surface textures (such as frosted grains, stri- 
ated grains, etc.) have hardly been approached statistically because the 
definitions cannot at present be operationally converted to numbers, 
The interested reader will find a discussion of methods and geological 
evaluation of the results in Krumbein and Pettijohn [18] and Pettijohn 
[27]. 

Aggregate properties (mass properties) of sediments depend on the 
associations of particles present in the deposit. Only three of a large 
number of aggregate properties are listed in the table. Some statistical 
work has been done on most mass properties, but in many instances it 
was confined to determination of mean values and degrees of spread. 

Table 1 distinguishes between distributions of the variates within 
single samples, as against distributions of mean values from closely- 
spaced samples. Mass properties usually yield only a single value per 
sediment sample, although subsamples from a larger sample tend to 
distribute themselves as shown. 

Information available for the last column of Table 1 is somewhat 
meager, although the kinds of distributions listed appear to apply. In 
many instances numerical characteristics of sedimentary phenomena 
show exponential rates of change when studied over long distances or 
in large areas. Frequency distributions of sample means taken over such 


larger areas sometimes are skewed, and may in some instances approach 
log normalcy. Although some percentage data are normally distributed 
as indicated, exceptions occur among rarer constituents. 

Because size distributions and mineral data were among the earliest 
investigated, they are used here for illustration to indicate the growth 
of statistical methodology in sedimentation. 


PARTICLE SIZE DISTRIBUTION Of SEDIMENTS 


Sedimentary particles range in size from the order of 10~ to 10‘ mm. 
in diameter. Some sediments such as glacial till include this entire 
range; others have very restricted ranges of size, such as dune sand, 
which extends from about 0.1 to 1.0 mm. Alli workers with soils and 
sediments (geologists, soil scientists, engineers) realized early that some 
sort of geometric size scale is necessary to facilitate analysis and permit 
comparison of data. In geology the most widely used grade scale is 
based on the ratio 2. The reference value is 1.0 mm. and the scale ex- 
tends in both directions, as 2, 4,8 --- mm., and 3, 3}, }--~- mm. 

Aside from technical problems of size analysis, not considered here, 
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early workers had the problem of statistical analysis of data arranged 
in unequal classes. It was felt that each size grade should have equal 
geometric significance (i.e., a change from 1 to 2 microns may be as 
important as a change from 1 to 2 mm.), and the uniformity or non- 
uniformity of the distribution should be expressible in such manner 
that fine or coarse sediments can be described in similar terms. These 
conditions were satisfied at an early date by the simple expedient of 
drawing histograms with equal width blocks for each geometric grade 
size, regardless of the absolute value of the class limits, The earliest 
such histograms known to the writer were used by Udden in 1898 [35]. 

There is no clear evidence that an implied log transformation was 
recognized at the time, although shortly after arithmetic cumulative 
curves were introduced in 1920, they were converted to their log equiv- 
alents by plotting them on semilog paper. These practices are still fol- 
lowed, inasmuch as nearly all histograms are shown with equal width 
blocks, and most cumulative curves are drawn on semilog paper for 
direct reading of median and quartiles. Log probability paper was used 
in the middle 1930’s and in 1936 the writer [17] introduced a log trans- 
formation to facilitate conventional statistical analysis. The trans- 
formation is given by the relation ¢= —loged, where d is the diameter 
in mm. The minus sign was used for adjustment of graphical methods 
commonly used by geologists. The phi notation permitted direct ap- 
plication of moment analysis to size data and permitted definition of 
a normal phi curve described by the phi mean and phi standard devia- 
tion. This concept was extended to a phi Gram-Charlier series, and be- 
came the basis for graphic methods introduced by Otto [25] and ex- 
tended by Inman [15]. Use of the phi mean and phi standard deviation 
instead of median and quartiles also permits convenient extensions of 
analysis to curve fitting, use of Chi square, applications of analysis of 
variance, etc. 

Although many sediments approach log normalcy, the finer-grained 
sediments tend to be only partly symmetrized by the phi transforma- 
tion. The writer experimented with other transformations to normalize 
these distributions, but on the whole it seems that most sediments can 
be described by the first four phi moments. 

A problem of some interest in size analysis is the use of weight per- 
centage frequency instead of number frequency. This question has been 
examined by Krumbein and Pettijohn [18], on the basis of earlier work 
by Hatch [14], who showed that if the number frequency distribution 
is log normal, the corresponding weight-frequency distribution is also 
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log normal, with the same log standard deviation, but with a system- 
atic change in the log mean. In part, the problem appears to be one of 
convenience ; for most analyses the number of grains per sample is very 
large and weighing is a more convenient means of expression. In coarse 
sediments where individual pebbles can be handled, and in microscopic 
examination of loose grains, number frequency is used. 


TABLE 2 
SEDIMENTARY APPLICATIONS OF STATISTICAL DATA 








Application Examples 





. Graphic presentation Histograms, cumulative curves, fre- 
quency curves, scatter diagrams, etc. 





. Summarized description and com- | Summaries of means, standard devia- 
parison tions, and other parameters; tests of 
relations among sediments. 





. Classification of sediments Statistical data as a basis for textural 
and other groupings of sediments. 





. Study of sedimentary character- | Graphs and maps showing systematic 
istics as functions of time and | changes in sediments along streams, 
space beaches, or over areas. Experimental 

design as a basis for studying population 

gradients and changes. 





. Use of statistical data in develop- | Comparison of field and laboratory data 
ment and testing of dynamic theo- | on sediment transportation and deposi- 
ries of sediment behavior and va- | tion with theories of sediment behavior 
riation derived in part from stage (4). 








Frequency is always the dependent variable in size analysis and al- 
though the physical interpretation of the data may vary with the man- 
ner of expressing frequency, the geometrical significance of the statisti- 
cal measures is the same regardless of the particular choice of frequency 
expression. The log transformation does not affect the relative area 
under any grade size block and hence is independent of the kind of 
frequency used. 

The statistical data of particle size analysis have been used in various 
ways. Table 2 summarizes these uses, which in a broad way are the 
same as for other sciences. Most effort in the study of sediments has 
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been directed toward the first three categories. These descriptive uses 
have been necessary in the sense that the characteristics of sediments 
had first to be determined, and studies were needed of the areal varia- 
tions in these characteristics over any given sedimentary environment. 
Once the range of values observed in nature was determined and some 
insight gained into the patterns of areal variation, it became possible 
to relate these observations to theory. In some instances theory pre- 
cedes practice, and the observational data are used to test the theoreti- 
cal structure. 


MINERAL ANALYSIS OF SAND 


The composition of sediments is expressed either in terms of chemical 
or mineralogical composition. Among sediments most extensively 
studied for mineral composition are sands, and in this section these 
sediments will be used as an example. 

Most sand deposits are composed predominantly of quartz, but 
nearly all sands have small amounts of dark minerals which are im- 
portant in sediment interpretation. These minerals have a greater 
specific gravity than quartz and are separated from it with heavy 
liquids. The heavy minerals, which are thus separated out, may com- 
prise from 0.1 to 5 or 10 per cent of the sand. 

In analyzing the heavy minerals, it is found that they may consist of 
few to a dozen or more species. A sample of several hundred to a thou- 
sand grains is studied under the microscope and the frequency of the 
species present is indicated as a number percentage. Inasmuch as there 
is no gradation among the mineral species, the distributions are multi- 
variate with discretely measured variables. 

The heavy mineral data are used partly to determine the kinds of 
source rocks which supplied the sediments, and partly to help decide 
whether two layers of sediment may be stratigraphically equivalent. 
Various statistical devices have been developed for these purposes. 

In many early studies of heavy minerals, the relative abundance of 
species was indicated by such terms as “common,” “rare,” etc., al- 
though the use of number or percentage frequencies became common 
in the early 1920’s. Problems quickly arose regarding the number of 
grains to be counted in order to avoid undue errors in estimating the 
rarer grain frequencies. Dryden [8] applied probable error theory to 
the problem in 1931 and concluded that about 300 grains should be 
counted. A second problem, investigated in 1935 by Dryden [9], con- 
cerned the comparison of heavy mineral suites among samples from 





58 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 10954 


different strata to test the geological equivalency of the beds. His ap- 
proach led to some discussion with Eisenhart [10] regarding the relative 
advantages of the correlation coefficient and the Chi-square test. Eisen- 
hart’s discussion clarified an important question in treating such data, 
and furnishes an excellent example of the contributions which statisti- 
cians can make to subject matter fields. 

In 1944 Allen [1] reviewed the problem of applying statistical meth- 
ods to mineralogical data. In 1949 he [2] applied the methods to a 
study of mineral variations in some Cretaceous deposits in England. 
Maps were presented showing areal variation in heavy mineral per- 
centages, and scatter diagrams with regression lines showed relations 
among mineral composition, particle size, etc. Allen demonstrated that 
certain “patchy” occurrences of minerals were due to small scale nearly 
random processes, which however did not seriously affect the regional 
picture of mineral variation. 

Allen’s work, in common with other heavy mineral studies, was di- 
rected toward showing the provenance (place of origin and kind of par- 
ent rock) of the sediments and the main directions of material trans- 
port. Other studies are directed toward determination of the “stability” 
of minerals, i.e., their tendency to persist for long distances or times of 
transport. Various authors have explored this problem statistically; 
Pettijohn [26] showed that an order of mineral stability could be 
erected which indicates the relative persistence of heavy minerals due 
to resistance to abrasion, solution, or decomposition. 

The statistical nature of heavy mineral distributions has not been 
given intensive treatment in the literature. Most sands consist of a 
large preponderance of quartz (plus detrital chert, feldspar, and some 
other “light minerals”) and small amounts of heavy minerals. All the 
minerals have discrete number frequency distributions. The more 
abundant ones follow the binomial law, and the rarer ones appear to be 
Poisson distributions. Many minerals show a normal distribution of 
mean percentages in closely spaced samples, as suggested in Table 1. 

Observed and expected distributions of pebble frequencies in Lake 
Michigan beach gravel are shown in Table 3, based on studies by the 
writer. One hundred samples of 10 pebbles each were drawn at random 
from a small beach area, and the occurrences per sample of each rock 
type were noted. The gravel consists of about 50 per cent limestone, 35 
per cent chert, 10 per cent basalt, and 5 per cent granite. The limestone 


11. W. Burr, in his oral discussion of this paper, suggested that the data of rarer minerals may be 
binomial distributions of low probability rather than Poisson. This point is considered below. 
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and chert values agree with their expected binomial distributions with 
P of the order of 0.50 by a Chi-square test. The limestone data are 
shown in the left of Table 3. The granite and basalt data agree with 
expected Poisson cistributions, again with P exceeding 0.50. Following 
Burr’s suggestion, however, the granite data were also compared with 
a binomal of low probability, and the Ghi-square test yielded a P of 
about the same value. The right hand part of Table 3 shows the granite 


TABLE 3 


LITHOLOGIC COMPOSITION OF LAKE MICHIGAN 
BEACH PEBBLES 








Number of Limestone Granite 


Occurrences 


per Sub- i Binomial 
sample Observed| Expected Expected 








59.9 
31. 
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data and expected values for both Poisson and low-probability bino- 
mial distributions. 

The writer has not explored these implications fully, but Burr’s sug- 
gestion opens a fresh viewpoint in the study of sediment composition. 
It furnishes an additional example of the contributions which statisti- 
cians can make to subject matter investigations. Geologically the ques- 
tion is important because it affects interpretation of mineral data from 
samples collected along lines of natural transport, as in streams. 
Abundant minerals of low stability, which display binomial distribu- 
tions near their source, become depleted as solution and abrasion act 
on them during transport. Do the binomial distributions merely change 
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by decrease of p, or is there some point at which the binomial law gives 
way to a Poisson law? 

It is evident from the preceding discussion that much remains to be 
learned about the multivariate distributions of minerals in sedimentary 
deposits. Experimental design has entered the field only slightly, and 
there is ample opportunity for fundamental research in this aspect of 
sedimentary petrology. 


RELATIONS AMONG SEDIMENTARY PROPERTIES 


A wide variety of sedimentary techniques has been used in studying 
relations among properties of sedimentary rocks. Scatter diagrams and 
correlation coefficients have been widely used in testing relations 
among particle size, shape, composition, and the like [18, Chapter 9}. 
Many sedimentary characteristics vary exponentially with distance 
from source, and relations between the properties themselves inde- 
pendent of distance are commonly power functions as may be expected. 

In the study of ancient sediments, the conditions of origin must be 
wholly inferred from relations among sediment properties and enclosed 
fossil organisms. In part, present day sediments are investigated to 
provide some basis for such interpretation. Comparisons of sediments 
formed under known conditions also yield data on the extent to which 
similar sediments may be produced by different environments. Beach 
sand and dune sand provide an excellent example, inasmuch as the 
responsible agents are waves and currents on one hand and wind on the 
other. In many instances the dune sand is derived from beach sand by 
selective wind transport, so that a continuous gradation is commonly 
discernible between the two. The question whether a single unknown 
sample is either beach or dune sand usually cannot be answered, al- 
though some separation of the populations can be effected with a group 
of samples. 

Tests of relations among sedimentary properties commoniy do not 
include evaluation of experimental and other errors. Chief reliance has 
been placed on the apparent spread or concentration of data in scatter 
diagrams. Where high correlation exists, and the data are not confused 
by use of inappropriate ratios, these analyses are probably sound. The 
problem of ratios and rates in studying geological data is one that re- 
quires further study. Some variables (such as grain sphericity and 
roundness) are defined in terms of ratios and yet show essentially nor- 
mal distributions. In some instances the use of ratios may disguise 
relations among raw data as Chayes [4] pointed out in 1949. 
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Many geological studies suffer somewhat from failure to relate sam- 
ple statistics to the corresponding population parameters. In many 
instances the sample statistics are used directly without determining 
standard errors or without applying tests for normalcy. Fortunately, 
most sediment properties have sufficient variation areally so that the 
experimental errors do not unduly cloud the relations. Many sedi- 
mentary samples display slightly skewed distributions, and some are 
highly skewed. A large number of the former show a reasonable proba- 
bility of coming from normal populations (P commonly is greater than 
0.10 in Chi-square tests). In part the generalization of Table 1 is based 
on such tests. Many samples are sufficiently skewed to reduce the 
likelihood that they came from normal populations, but relatively little 
has been done to investigate the geological conditions which may pro- 
duce skewness. Similarly, the peakedness of some sedimentary distribu- 
tions is greater than normal populations show. As with skewness, little 
has been done on this aspect, although there are suggestions that long- 
continued movement and agitation of sediments by geological agents 
may produce highly peaked symmetrical size frequency curves. 

A large number of measured characteristics of sediments show bi- 
modal or polymodal tendencies, which seem mainly to be the result of 
mixing effects under rapidly-shifting environmental conditions or of 
composite sampling which includes more than a single population. Con- 
sidering that some sedimentary laminae may be a millimeter or less 
thick, the mechanical process of obtaining an unmixed sample may pre- 
sent a serious problem. Otto [24] introduced the concept of a “sedimen- 
tation unit” in 1938 as a basis for critical sampling of thin units. 


SAMPLING PROBLEMS IN SEDIMENTATION 


The foregoing brief discussions of size and mineral data indicate 
some of the main lines of statistical development in sedimentation. 
Most other sedimentary attributes were quantified after size analysis 
had become established, so that they were able to profit by the statis- 
tical experience of earlier work. 

Several problems only partly solved are shared in common by all 
sedimentary fields. A principal one is that of sampling. It has been 
known for some time that the means of samples collected in a small area 
tend to be normally distributed for many sedimentary properties. If 
the samples from any one deposit are spread over a larger area, the dis- 
tribution of sample means tends to develop a larger variance or to be- 
come skewed. In part, these changes are related to areal variations in 
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the population brought about by changes in the physical and chemical 
conditions of sedimentation over the large area. Maps which bring out 
these systematic changes can be made of the mean values or of the vari- 
ances. 

Two kinds of sampling problems commonly arise in sedimentary 
studies. One is concerned with the small-scale local variations of the 
sediment, and the other involves the large-scale or regional variations. 
These problems are met in oil exploration, for example. Oil occurs in 
sedimentary rocks and shows some relation to optimum sedimentary 
conditions. In studying potential oil-bearing areas, how far apart should 
samples be spaced to bring out the regional sedimentary trends? How 
closely should they be spaced to bring out the local departures from the 
regional trends? In part, oil occurs in areas which show anomalous 
variations from the regional picture. As some guide to the scale of think- 
ing, regional maps may cover areas as great as 100,000 square miles. 
Regional samples from boreholes may be spaced about one per 500 
square miles, and maximum close spacing is about 60 wells per square 
mile in thoroughly explored areas. 

The study of recent sediments may be illustrated by the problem of 
sampling a beach 200 feet wide and several miles long. The deposits 
vary across the beach, along the beach, and with depth below the sur- 
face. The total volume of sediment may be of the order of 10*® cubic 
feet, and the number of grains in the population may be of the order 
of 10'*. Along with the sand samples, data are to be collected on beach 
slopes, wave energy, strength of currents, etc., so that relations between 
sand characteristics and geological processes can be studied. How many 
samples should be taken, how should they be distributed over the 
beach, and how deeply should each sample penetrate the sand layers? 

Solutions of such sampling problems have been largely empirical. If 
the upper layers are to be emphasized as being most nearly related to 
contemporary processes, closely spaced shallow samples commonly are 
collected along beach profiles spaced about } mile apart. An average of 
the samples along the profiles is used to characterize each 3-mile point 
along the beach. Presumably, the closely spaced samples compensate 
for local variations and the widely spaced averages bring out the 
sedimentary trends. 

Designed experiments, which include the sample layout and specify 
the number and kinds of samples necessary for any given study, seem 
to offer one of the best approaches toward the sampling problem. As 
analysis of variance methods become more familiar to geologists, it is 
likely that planned experiments will dominate over the somewhat less 
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organized field studies which have been the rule in the past. Cochran’s 
recent book, [7], published since this manuscript was written, carries 
many suggestions for stratified or systematic sampling plans which can 
be designed for particle populations. 

Some studies have been directed toward evaluation of sampling and 
laboratory errors in size and mineral analysis. The writer [16] studied 
the probable error of sampling beach sands in 1934; more recently 
Griffiths [13] applied analysis of variance to the same data to sharpen 
the concepts. In 1937 Otto [23] applied Shewart’s theory of control to 
the improvement of a splitter for obtaining subsamples from larger 
. field samples. 

Most error studies include evaluation of laboratory errors (sample 
splitting, sieving, weighing, etc.), as well as field variation from sample 
tosample. As an indication of magnitudes, it was found that the labora- 
tory error in particle size analysis of beach sand was only about } as 
large as the field sampling error (0.54 against 4.5 per cent) for the 
average diameter. In heavy mineral studies of the same samples the 
average laboratory error, assigned mainly to counting errors, was about 
equal to the field sampling error. Each was of the order of 10 per cent. 

Griffiths and co-workers [11, 12, 31] applied two- and three-factor 
analysis of variance models to evaluation of sample, operator, and 
technique effects in grain orientation and porosity studies. The designs 
included evaluation of interactions. Perhaps the earliest analysis of 
variance study in the geological literature is that of Swineford and 
Swineford [33] published in 1946, in which a three-factor model was 
used to test the relative efficiency of sieve shaking equipment. 

It is in the field of experimental design that some of the more im- 
portant advances in statistical analysis of geological data will be made. 
Unlike many physical sciences, laboratory experimentation is not the 
main source of observational data in geology. Rather, the geologist 
must rely on field observations of natural processes and deposits for his 
data. Experimentation plays its part in studies of sand movement in 
water flumes and wind tunnels. 


CONCLUDING REMARKS 


The original notes for this paper were prepared during the autumn 
of 1952. It was apparent at the time that rapid developments could be 
expected in application of newer methods of statistical analysis to 
geological data. Important papers on analysis of variance had appeared 
in several fields [3, 6, 22], problems of sampling were being examined 
more closely, and workers were beginning to relate sample statistics to 
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population parameters [34]. The Earth Science Panel of the Committee 
on Statistics in the Physical Sciences of the A.S.A. had been organized 
and was bringing interested workers together. Early in 1953 a sympo- 
sium of papers on statistics in geclogy was organized under R. L. Mil- 
ler’s supervision by the Journal of Geology. The November, 1953, and 
January, 1954 issues are devoted to the subject, and include several 
papers reviewing or extending applications of statistical analysis to 
sediments. 

In some respects the last several years mark a resurgence of statistical 
interest in geology, directed toward more critical analysis of sedimen- 
tary data, and especially toward applications of experimental design. 
This is in contrast to earlier principal interest in developing and adapt- 
ing techniques for description and classification of sediments, which 
reached its climax in late prewar years. A glance at the 1953 develop- 
ments in the expanded bibliography [12, 13, 19, 20, 21, 29, 31] will 
indicate the directions of some of these later trends in sedimentary 
statistical analysis. 

In the newer developments of statistical application geologists are 
becoming increasingly aware that progress in advanced statistical anly- 
sis requires co-operation from statisticians. Opportunities for fuller 
collaboration between earth scientists and statisticians, such as are 


provided by the Committee on Statistics in the Physical Sciences, will 
do much to bring subject-matter and methodology groups together. 
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RELATIONSHIP BETWEEN AN INDEX OF HOUSE 
PRICES AND BUILDING COSTS* 


Davip M. Buank 
Columbia University 


EFLATION of residential wealth and residential construction ex- 
D penditure estimates to constant dollar levels in principle requires 
the use of a price index of residential construction. However, no na- 
tional market price index covering a reasonably long period of time 
exists, although house price indexes have been constructed for several 
cities, usually covering.a relatively few years. Consequently, a con- 
struction cost index is typically used as a substitute, on the view that 
the movement of such an index is a resonable reflection of changes in 
new house prices. This article attempts to assess the validity of this as- 
sumption and to judge the margins of error involved in employing a 
construction cost index as a deflator. 


POSSIBLE DIVERGENCE BETWEEN COST AND PRICE INDEXES 


It could be reasoned that significant short and long-term divergences 
might arise between a valid index of the market price of homes and 
indexes of construction cost. These divergences can be assigned to two 
causes: technical problems in defining and measuring construction 
costs and real deviations between new and old house prices. 

Construction cost indexes usually exclude builders’ profits and, often, 
overhead charges, or add a constant percentage to direct cost to cover 
these items. The apparently wide short-term variability in builders’ 
profits thus permits significant differences to arise between the move- 
ment of prices of new homes and cost indexes. To the extent that there 
has been a secular movement of builders’ profits and overhead costs 
which is not taken account of in construction cost indexes, even secular 
divergences may arise. 

More importantly, technical problems inherent in devising construc- 
tion cost indexes involve at least the possibility of deviations between 

* Some of the material contained in this article is derived from a study of Capital Formation in 
Residential Real Estate by Leo Grebler, David M. Blank, and Louis Winnick, which is to be published 
by the National Bureau of Economic Research. 

1 For example, Toledo—William Hoad, Real Estate Prices (unpublished doctoral dissertation, Uni- 
versity of Michigan, 1942); Washington—unpublished data from the Housing and Home Finance 
Agency, quoted in Ernest M. Fisher, Urban Real Estate Markets: Characteristics and Financing (New 
York: National Bureau of Economic Research, 1951), p. 54; Ann Arbor—Herman Wyngarden, “An 


Index of Local Real Estate Prices,” Michigan Business Studies (Ann. Arbor: University of Michigan, 
January 1927). 
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such indexes and a true price index of nev. homes. Most residential 
construction cost indexes apparently are derived as some form of 
weighted average of materials prices and wage rates. They differ in the 
number of materials and labor skills covered and in the degree to which 
the weights are based on specific, rather than generalized, types of con- 
struction. The weights are usually unchanged, or changed little, over 
the entire period covered. Such indexes suffer from several defects. One 
such defect results from the fact that indexes of this kind cannot take 
fully into account the changing importance or the relative price move- 
ments of new materials and equipment which have beer added to the 
house over time. In addition, for early years there is a serious question 
as to whether actual prices and wage rates, rather than nominal prices 
and rates, have entered into such indexes. Finally, such indexes are 
unable to take into account changes in site productivity. 

If these technical problems were solved, cost indexes would properly 
measure the changes in prices of new homes. However, discrepancies 
between such cost and price indexes and an index of old home prices 
could still arise. Because of the interconnection between the markets 
for new and old homes their price movements should be in close con- 
formity at most times. Nevertheless, divergences could appear at the 
trough of the building cycle when the prices of existing homes may 
sink below the price at which new houses would be offered on the mar- 
ket if there were any building activity. Indeed, it is for this reason that 
construction volume sometimes declines to more or less negligible levels. 
Discrepancies could also appear in the upswing of the cycle during 
short periods when new construction lags behind the increase in demand 
for dwelling units; existing houses may command premiums at such 
times because of their immediate availability. At either cycle stage, 
the divergences may last as long as several years. 


A NEW PRICE INDEX, 1890-1934 


To test the differences in movement between a representative con- 
struction cost index and the prices of homes, a house price index for 
1890-1934 was developed and compared with the cost index. The data 
for the price index were derived from the Financial Survey of Urban 
Housing? which presented financial data and other information for a 
sample of residential structures in 61 cities in 1934. Detailed informa- 
tion in the Survey is available only for 22 cities. The 22 cities are widely 
scattered geographically, with at least two cities representing each of 





2 Financial Survey of Urban Housing (Washington, D. C.: U.S. Department of Commerce, 1937). 
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the nine census divisions except the East South Central division which 
had only one city in the 22-city sample. One set of questions asked of 
each owner of a residential structure related to: (a) value of the prop- 
erty in 1934, (b) year of acquisition by the then-present owner, and 
(c) original cost to owner at time of acquisition. This information was 
summarized for each city and a table presented for each of the 22 cities, 
listing the number of properties included in the 1934 sample which 
were acquired in each year from 1890 to 1933, the total acquisition cost 
of properties acquired in each such year and the value of each group of 
such properties in 1934. Separate data for all owner-occupied and all 
tenant-occupied structures and for all single-family owner-occupied 
houses and all single-family tenant-occupied houses were presented, 
rather than over-all figures for all residential properties. 

The data selected for analysis were those relating to single-family 
owner-occupied houses, on the view that this relatively homogeneous 
group which comprises the most important portion of the nonfarm 
housing stock would show a more consistent pattern than the other 
categories. The all owner-occupied category might have been a reason- 
able alternative, but was rejected because it was less homogeneous 
than the single-family owner-occupied category. The two tenant- 
occupied segments were rejected because they included too small a 
number of properties and because the all tenant-occupied group was too 
heterogeneous. The tenant- and owner-occupied data could not be 
combined as they were based on two separate samples and the size of 
the two samples did not reflect the proportion of owner-occupied 
and tenant-occupied properties in the respective cities. 

A relative for each year was calculated for each city, based on the 
ratio of the total acquisition cost of the single-family owner-occupied 
houses acquired in each given year in a given city to their value in 1934. 
The median relative for each year was then determined.’ This series of 
median relatives, based on 1934 values equal to 100, was converted to 
a 1929 base; the converted series is presented in Table 1. 

The assumptions underlying the price index warrant clarification be- 
fore any comparisons are drawn. It is assumed, first, that the acquisi- 
tion cost estimates are reasonably accurate. In all likelihood, the esti- 





* To determine the effect of the specific averaging procedure on the final results, a test was per- 
formed on the data for a single year in each of the four full decades covered. The relatives for each 
such year were combined in the form of the median, positional mean, unweighted arithmetic mean, un- 
weighted geometric ranean, and weighted arithmetic mean (in which the weights were the number of 
households in each city at the nearest censal year). The range of results in each year was relatively small, 
so that the simplest measure, the median, was used in the computations for the final series. Individual 
city relatives based on less than four properties were disregarded in the computation of the median. 
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mates of acquisition cost for properties acquired in the early years of the 
period studies have some margin of error. It is also assumed that the 
year of acquisition has been accurately reported; here again, there un- 
doubtedly are significant error margins for the early years, with a 
tendency for respondents to report acquisitions in years which are mul- 
tiples of five. Finally, it is assumed that the movement between median 
relatives of two successive years approximates the movement in prices 
of a single sample between the two years; it will be remembered that 
each relative, before conversion to a 1929 base, actually represents the 


TABLE 1 


UNADJUSTED PRICE INDEX OF ONE-FAMILY OWNER- 
OCCUPIED HOUSES, 22 CITIES, 1890-1934 
(1929 = 100)* 








Year Index Year 





1890 61.3 1915 
1891 55.3 1916 
1892 56.3 1917 
1893 58.7 1918 
1894 68. 1919 


— 


1895 62. 
1896 53. 
1897 55. 
1898 59. 
1899 56. 


1920 
1921 
1922 
1923 
1924 


aoomn 


1 
8 
5 
1 
5 


1925 
1926 
1927 
1928 
1929 


1900 64 
1901 54. 
1902 63. 
1903 64 
1904 67. 


wooNWe 
oOeaous 


1905 59 
1906 70. 
1907 77. 1932 
1908 70. 1933 
1909 68.7 1934 


1930 
1931 


OOaN 


1910 74.2 
1911 72.5 
1912 75.3 
1913 75.3 
1914 78.1 





* Yearly median of 22 city relatives, excluding those relatives based on 3 or iess properties. 
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movement in prices of a separate sample between the given year and 
1934. 

The validity of the 1934 value estimate probably does not seriously 
affect the movement of the price index, except for the 1934 value itself. 
It would only affect this movement if the degree of underestimate or 
overestimate of value in 1934 were corielated with length of holding. 

It should be pointed out that the constructed price index applies to 
both new and old houses. The relative for a given year relates the 
acquisition cost of properties purchased in that year to their value in 
1934, regardless of whether the acquisition was of a new or an old struc- 
ture. A cursory examination of the data indicates that somewhat more 
than one-half of the properties in the 1934 sample which were acquired 
in the 1890-99 decade were new houses; somewhat more than one-third 
in the 1900-09 decade were new houses; and somewhat more than one- 
fifth in the remaining years were new houses. It was suggested earlier 
that there should be no reason for any difference in the price movement 
of new and old houses, other than in periods of depressed building 
activity or for short periods during the upswing of the cycle when con- 
sumer ignorance and relative availability may play a role. At other 
times, the movement of prices of houses of varying age and quality 
should be roughly similar. And the price variations of new housing, 
once it has entered the housing stock, should be the same as the original 
stock, subject to the same differential depreciation rates as apply to an 
existing housing stock composed of structures of different ages. 

The index in its present form is subject to two major offsetting biases, 
viz., value losses due to depreciation and obsolescence and value incre- 
ments in the form of structural additions and alterations. The price 
relative for 1904, for example, before conversion to a 1929 base, meas- 
ures the change in price of a given set of properties between 1904 and 
1934; this change is affected by the thirty years of depreciation operat- 
ing on these properties and is somewhat smaller than the change in 
price which would be measured if this group of properties in 1934 had 
the same age structure as they did in 1904. Conversely, any structural 
additions or alterations to the properties between time of acquisition 
and 1934 would tend to make the price rise larger between these two 
periods than the theoretically correct price movement. 

It is generally accepted that value losses due to depreciation and 
obsolescence typically outweigh value gains due to additions and 
alterations.‘ Therefore, the present index must be biased downward 
as the net result of these two kinds of value change. 





4See Leo Grebler, David M. Blank, and Louis Winnick, Capital Formation in Residential Real 
Estate—Trends and Prospects (National Bureau of Economic Research, forthcoming), Appendix E. 
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Further corroboration for this view is found in a comparison of two 
sets of house price indexes for Cleveland and Seattle (Table 2). One 
set of indexes comprises the series of relatives for these two cities, 
which, together with the relatives for the remaining 20 cities, provided 
the basis for calculating the 22-city price index. These indexes are 


TABLE 2 


HOUSE PRICE INDEXES, CLEVELAND AND SEATTLE 
(1929 = 100 .0) 

















Cleveland Seattle 
Price Index Price Index 
Year Garfield-Hoad Underlying Garfield-Hoad Underlying 
Price Index 22-City Price Index 22-City 
Index Index 

(1) (2) (3) (4) 
1907 35.4 64.7 
1908 36.6 60.8 
1909 40.2 66.5 56.9 76.4 
1910 43.9 59.1 58.8 74.4 
1911 45.1 57.7 56.9 82.9 
1912 46.3 62.0 64.7 73.6 
1913 47.6 63.8 62.7 78.0 
1914 50.0 72.2 64.7 86.9 
1915 51.2 70.0 66.7 86.9 
1916 53.7 71.0 64.7 77.7 
1917 58.5 77.2 62.7 76.3 
1918 67.1 89.7 66.7 82.1 
1919 76.8 89.6 78.4 92.6 
1920 86.8 104.7 88.2 95.7 
1921 87.8 102.9 86.3 92.5 
1922 91.5 104.6 99.8 88.3 
1923 96.3 101.1 100.0 94.2 
1924 100.0 113.1 117.6 96.7 
1925 102.4 112.9 109.8 102.9 
1926 103.7 114.5 107.8 98.0 
1927 102.4 106.1 99.9 98.2 
1928 101.2 111.0 102.0 99.6 
1929 100.0 100.0 100.0 100.0 
1930 95.1 94.3 88.2 92.5 

Sources: 


Columns 1 and 3—Index derived from 3-year moving averages of prices paid for new six-room frame 
house and lot. Garfield and Hoad, op. cit. 

Columns 2 and 4—Inderx for prices of 1-family owner-occupied homes. Derived from data in Finan- 
cial Survey. Index one of 22 underlying 22-city price index. 
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subject to the same biases for depreciation and additions as the 22-city 
index itself. 

The second set of indexes were derived in such a manner as to exclude 
any such bias. They are based on three-year moving averages of prices 
paid for new owner-occupied single-family homes in Cleveland and 
Seattle, derived by Frank R. Garfield and William M. Hoad from 
special tabulations of unpublished data from the Financial Survey of 
Urban Housing. From these tabulations Garfield and Hoad were able 
to compute average prices paid for new homes (including the lots under- 
lying the structures) of specified types in each city in each year covered. 
The authors focus attention primarily on the price movement of five- 
and six-room frame houses, on the assumption that changes in the 
transaction-mix would affect the averages but little, since an analysis 
of the distribution of prices paid for various types of homes purchased 
in Cleveland in 1924 had indicated that these were relatively homo- 
geneous types of structures. The series for six-room frame houses in 
each city, converted to indexes with a 1929 base, are given in Table 2. 

The properties underlying the Garfield-Hoad indexes may have been 
subject to changes in size and quality of structures and in land ratios 
which would result in divergences between these indexes and a valid 
house price index. But such changes were probably severely limited in 
extent due to the stated homogeneity over time of the houses with 
regard to size and type of structure and construction, i.e., 6-room single- 
family frame houses. And the restriction of the data to new houses 
specifically excludes any biases due to depreciation, obsolescence, or 
additions and alterations. 

A comparison between the two sets of indexes shows a significantly 
greater rise in the Garfield-Hoad indexes between the pre-World War 
I period and the late twenties than in the price indexes for Cleveland 
and Seattle underlying the 22-city price index. This difference is fully 
consistent with the existence of a downward bias in the 22-city index 
due to the effects of depreciation gross of additions and alterations. 

A detailed examination of empirical data, undertaken elsewhere, 
suggests that the decline in value of single-family houses over the first 
52 years of life, resulting from the net effect of depreciation and 
obsolescence on the one hand and additions and alterations on the 
other, approximates that resulting from a 1.2 per cent linear rate of 
depreciation.® Since the 22-city index is based on movements in the 





5 Frank R. Garfield and William M. Hoad, “Construction Costs and Real Property Values,” Journal 
of the American Statistical Association, December 1937, pp. 643-53. 
* Grebler, Blank, and Winnick, loc. cit. The data were derived from a special study by the Federal 
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prices of structures plus land, the depreciation correction for this index 
also requires a rate based on structures plus land. The relevant linear 
rate, derived from the same data, is about 1.0 per cent. 

All studies of the decline in market value of houses as they age clearly 
indicate that a curvilinear rate of depreciation is more appropriate for 
residential structures than a linear rate. The compound rate of de- 
preciation which yields about the same remaining value after 52 years 
as a 1.0 per cent linear rate, but which approximates more closely the 
path of declining value of residential structures as they age, is about 
1% per cent. Accordingly, the 22-city index was corrected for a 13 per 
cent compound rate of depreciation. The series so calculated, after 
adjustment so that 1929 again equals 100, is presented in Table 3. 

Generally speaking, the corrected price index shows an upward secu- 
lar drift from 1890 to about 1916, a more rapid rise to 1920, a smaller 
rise to 1925, and a decline thereafter to 1933. Between 1890 and about 
1925, short cycles of about four years in duration are discernible in the 
data, with peaks appearing in 1894, 1900, 1904, 1907, 1910, 1914, 1920, 
and 1925.7 


PRICE INDEX COMPARED WITH CONSTRUCTION COST INDEX 


No residential construction cost index covers the entire period from 
1890 to 1934 but the Boeckh residential construction cost index, based 


on 20 cities, starts in 1910 and can be extrapolated back to 1890 in a 
customary fashion by the use of building materials and building wage 
rate indexes. The Boeckh index is one of the few adequate construction 
cost indexes available and is the only one aimed specifically at measur- 
ing changes in cost of construction of residential structures. ® 

The combined index is presented in Table 4. The construction cost 





Housing Administration of a sample of single-family homes appraised by FHA in 1939; from William 
M. Hoad, “Real Estate Prices, A Study of Residential Real Estate in Lucas County, Ohio,” unpublished 
doctoral dissertation, University of Michigan, Ann Arbor, 1942; and from Raymond Goldsmith's 
analysis in “A Perpetual Inventory of National Wealth,” Studies in Income and Wealth, Vol. XIV, 
(New York: National Bureau of Economic Research, 1951) of data gathered by the Financial Survey 
of Urban Housing. It should be pointed out that a 1.2 per cent linear rate of depreciation for houses is 
significantly below the rates presented in Bulletin F and used by the Department of Commerce and 
most other investigators in this field, even after all adjustments are made for comparability. 

? The short cycle in house prices approximates closely in length the short cycle found in building 
activity by Long. Clarence D. Long, Jr., Building Cycles and the Theory of Investment (Princeton, 1940), 
p. 104, 

8 E. H. Boeckh and Associates actually construct ten indexes for different types of structures, 
both residential and nonresidential, for various cities. Two of these indexes, for frame and for brick 
one- to six-family residential structures, for 20 cities have been combined by the several successive 
federal housing agencies into a single residential cost index which is used by the Department of Com- 
merce in deriving the residential construction expenditure component of the deflated Gross National 
Product series. It is this index which is referred to in the text. 
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TABLE 3 


PRICE INDEX OF ONE-FAMILY OWNER-OCCUPIED HOUSES, 
-. 22 CITIES, CORRECTED FOR 1# PER CENT COM- 
POUND ANNUAL DEPRECIATION 1890-1934 
(1929= 100.0) 








Index Year 





36.0 1910 
32.9 1911 
34.0 1912 
35.9 1913 
42.4 1914 
39.0 1915 
34.3 1916 
35.9 1917 
38.7 1918 
37.5 1919 


oon 


© XO 
WMWHOMNASN AO 
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Sssesssesag 


43.5 1920 
37.0 1921 
42.4 1922 
45.5 1923 
48.3 1924 
42.9 1925 
51.6 1926 
57.7 1927 
52.8 1928 
52.3 1929 


CONDOR HE NNWASH 


1930 
1931 
1932 
1933 
1934 


woome 





Source: Index, Table 1, corrected for 1# per cent compound annual depreciation. 


index for 1890-1934 and the corrected house price index (Table 3) for 
the same period are compared in Chart I. A comparison of the two in- 
dexes suggests two important conclusions with regard to the relation- 
' ship between construction costs and house prices. 

Except for the period 1916—1922,° the price index shows more short- 
run variability than the cost index. The latter is quite stable over the 





® The cost index rises to a much sharper peak in 1920 than does the price index. This sharp rise in 
1920 is found in all construction coat indexes and probably reflects a real difference in construction costa 
and prices in that year. It seems to have been a result of a unique set of supply and transportation diffi- 
culties in the winter and spring of 1920. 
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TABLE 4 
RESIDENTIAL CONSTRUCTION COST INDEX, 1890-1934 
(1929 = 100.0) 








Year 





1890 
1891 
1892 
1893 
1894 


1895 
1896 
1897 
1898 
1899 


nope co 


1900 
1901 
1902 
1903 
1904 


_ > PP Pp 
SSeteEes 
aowr & 


1905 
1906 
1907 
1908 
1909 


aorta » >» 
— © = OC 
oF © or 


1910 53. 
1911 52. 
1912 53. 
1913 51. 
1914 52. 


wo wa td 





Sources: 
1890-1906: 1907 value extrapolated by weighted average of an index of average wages per hour 
in the building trades and an index of building materials prices. Wage index from Department of 
Commerce and Labor, Bulletin of the Bureau of Labor, No. 77, July 1908; see Historical Statistics, 
p. 66. Price index from Handbook of Labor Statistics, 1941 edition, Vol. 1; see Historical Statistics, 
pp. 233-34. Weights—wages, 1.0; materials, 1.5. Weights derived from NHA analysis of housing 
costs; see Housing Statistics Handbook, p. 32. 
1907-1909: 1910 value extrapolated by weighted average of an index of wage rates in the building 
trades and an index of building materials prices. Wage rate index from Bureau of Labor Statistics 
annual reports, Union Wages and Hours in the Building Trades; see Historical Statistics, p. 69. Price 
index from same source as 1890-1906. Weights same as above. 
1910-1914: 1915 value extrapolated by Boeckh index of residential construction cost, as given in 
Historical Statistics, p. 172. 
1916-1934: Boeckh residential construction cost index, as given in Construction and Building Ma- 
terials, Statistical Supplement, May 1951, Department of Commerce, p. 40, converted to 1929 base. 
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Cuart I. Price index of single-family owner-occupied houses, corrected for 
depreciation, and residential construction cost index, 1890-1934. (1929 = 100.) 


Source: Tables 3 and 4. 


pre-1916 period, partly perhaps as a result of the way in which it is 
constructed, while the price index shows substantial fluctuations. Be- 
tween 1905 and 1909, for example, the price index has a rise of more 
than 34 per cent and a fall of almost 10 per cent, as compared with the 
cost index which rises only 15 per cent between 1905 and 1907 and de- 
clines only 3 per cent between 1907 and 1908. The same relationship 
holds for the period after 1922; the price index falls 5 per cent between 
1925 and 1927 while the cost index remains almost unchanged. In 
sum, it seems reasonable to conclude that in most periods the market 
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price of homes fluctuates more widely over the short run than do con- 
struction costs as measured by standard construction cost indexes. As 
a result, the annual movements of any construction series deflated by a 
construction cost index are subject to some margin of error. 

But equally important is the fact that the long-run movement of the 
two indexes is remarkably similar. Thus, the construction cost index in 
1921-1929 is about 245 per cent of its level in 1895-1905; the corrected 
price index in 1921-1929 is about 241 per cent of its level in 1895- 
1905. It must be remembered that the price data and depreciation data 
underlying the corrected price index are derived from independent 
sources and that both are completely independent of the cost data 
underlying the construction cost index. In view of this independence of 
derivation, the almost identical long-run movement of the two series 
over four and a half decades argues strongly that the construction cost 
index measures with quite reasonable accuracy the secular movement of 
house prices.’° 


CONCLUSIONS 


The 22-city price index and the construction cost index show sig- 
nificant short-term divergences. These suggest that market prices of 
homes fluctuate more widely than construction costs, the difference in 
rise or fall perhaps amounting to as much as 10 per cent in a period of 
several years. For short-term analysis, then, some margins of error are 
involved in using the cost index as an approximation of a price index. 

With regard to long-term movements, however, the construction 
cost index conforms very closely to the price index, corrected for de- 
preciation. It would appear, therefore, that for long-term analysis the 
margin of error involved in using the cost index as an approximation 
of a price index cannot be very great. 





10 Only if there were major increases in site productivity not reflected in the construction cost 
index might this view be questioned. Although data on this question are extremely scanty, there is some 
evidence that the cost index is not subject to major error on this score, partly because the index does not 
fully reflect the historic increases in labor cost and partly because gains in site productivity of residential 
construction have been relatively limited, except perhaps in the very recent past. See Miles Colean and 
Robinson Newcomb, Stabilizing Construction: The Record and Potential (New York: McGraw-Hill, 
1952), pp. 69-74, 247-248. Also Grebler, Blank, and Winnick, op. cit., Appendix C. 














CYCLES IN THE BALANCE OF PAYMENTS* 


SoLoMon FABRICANT 
New York University 


OOKING back at the vast literature on international trade theory, 
Jacob Viner noted in 1937 that the older discussions contained “only 
scattered and incidental references to the repercussions on the inter- 
national mechanism of cyclical fluctuations in business activity” 
(Studies in the Theory of International Trade, p. 432). But he observed 
also that “within the last few years” the question was being “ more 
seriously tackled ;” and, as we now know, at the very time he wrote this 
spark of energy was already being blown into flame by the lively breath 
of Keynes’s General Theory. In the past decade and a half, however, 
little of this energy has been devoted to the “inductive spadework on 
the international aspects of business fluctuations” that Viner rightly 
felt was one of the steps necessary in a fruitful attempt to “incorporate 
cycle theory into the theory of international trade ...or to apply 
international trade theory to cycle theory.” 

The statistical analysis by Chang is therefore one of the exceptions. 
Indeed, it is the only sustained attempt to determine in some system- 
atic fashion the characteristic cyclical behavior of the several items in 
the balance of payments of a variety of countries that has been pub- 
lished so far. Such enterprise merits applause—and in this case even 
wonder, for Mr. Chang, apparently working without statistical assist- 
ance, must have spent his days and his nights with the calculations. 

Necessary perspective on Mr. Chang’s work is provided if we start 
by asking ourselves what “inductive spadework on the international 
aspects of business fluctuations” should attempt to uncover. Suppose— 
let us dream—that we have unlimited time, money, and data. For any 
country, then, we want to know how each of the several items in its 
balance of payments fluctuates, and how each behaves during domestic 
and foreign business cycles. How well do changes in exports, for exam- 
ple, conform to changes in domestic business conditions? Do turns in 
exports usually lead, coincide with, or lag behind, turns in domestic 
business, or is their timing irregular; what is the average amplitude of 
fluctuation of exports, and the variation about this average, during 
domestic business cycles; do exports usually rise most vigorously during 
the early stages of domestic business expansion or during the later 
stages, or is there no systematic difference; does “quantum” of exports 





* A review article of Cyclical Movements in the Balance of Payments, by Tse Chun Chang. Cambridge 
(England): Cambridge University Press. 1951. Pp. x, 224. $3.75. 


79 





80 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1954 


fluctuate more or less than price of exports, and price of exports more 
or less than price of imports and other goods? Are there any systematic 
differences between the cyclical behavior of exports of raw materials 
and those of fabricated products, and if so, what are the relative weights 
of the several classes of goods, and how have long-run changes in these 
weights affected the cyclical behavior of the total? And we ask, also, 
if the behavior of exports seems about the same whether the domestic 
cycle is short or long, mild or severe; and if there are differences in 
cyclical behavior systematically related to the rate of growth of the 
economy or its several parts, or to other secular or structural changes, 
e.g., in tariff barriers and monetary standards. Then, looking over a 
country’s borders at business conditions abroad, we consider how the 
latter are related to domestic conditions and to fluctuations in the items 
in its balance of payments. Having thus studied cycles in the balance 
of payments of each country—or an adequate sample of countries— 
we would go on to see whether the countries fall into homogeneous 
groups, each with its characteristic type of balance of payments fluctu- 
ation: whether, for example, “industrial countries” differ from “raw- 
material producers,” or industrial countries heavily engaged in export 
production from those in which production for export markets is rela- 
tively small. And we would search also for similarities among countries 
with respect to secular change in behavior. 

These questions would be put differently and in different order by 
a person with other tastes and theoretical predilections, but in one form 
or another they—and many other factual questions—would be in the 
list of every economist earnestly seeking light on the international 
aspects of business fluctuations. 

To answer these questions—remember, we are dreaming—we would 
use many rather long statistical series, and most of these would be on 
a monthly or quarterly basis. We would use a statistical apparatus that 
enabled us to study the shape of individual cycles in each series, as 
well as averages. We would relate developments in each country to 
those in countries closely tied to it by trade and finance, as well as to 
those in the rest of the world as a whole. 

It will be no surprise to the reader that Mr. Chang has answered few 
of our questions. Apart from the obvious reasons, others appear as we 
take stock of what he did. 

For each of six countries (Britain, the United States, Sweden, Aus- 
tralia, Chile, and Canada) Chang took the annual figures for each 
major item in the balance of payments (except the net gold and capital 
flow) during the period 1924-1938, eliminated trends, and determined 
—one at a time—the linear logarithmic equation relating the fluctua- 
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tions in each series to those in the presumed causal factors. Reading 
these equations, he had the percentage by which each item in the bal- 
ance of payments changes on the average with a one per cent change in 
each of the “independent” factors. These factors were then related by 
a similar process to real “world income” (more exactly, the real income 
of the rest of the world). Simple substitutions in the equations gave 
him the percentage change in each item which could be expected if 
world income rose or fell by one per cent. Applying these percentage 
changes to the figures in a base period, he calculated the corresponding 
absolute amount of change in each item. The net difference among 
them gave him the absolute change that could be expected in net cap- 
ital exports, including gold and the residual error. The various equa- 
tions, and a tabulation of the absolute changes to be expected when real 
world income rises or falls by one per cent, constitute his statistical 
summary of the cyclical fluctuations in the balance of payments of 
each country. 

The results of this process of summarization may be illustrated by 
the American figures. The following table gives the regression coefli- 
cients (i.e., the elasticities) : 








Elasticity with respect to 





Us World Us World US Relative 
Real Real Money Money Import Export 
Income Income Income Income Price® Price® 





United States 

Quantity of imports ; ‘ — .97 
Quantity of exports 

Import Price 

Export Price 

Interest receipts 

Interest payments 

Net tourist payments? 

Net immigrants’ remittances 
Net shipping payments 

Real income 

Cost of living 

Money income 

Competitors’ price 


World 
Money income 


(in U.S. dollars) 





* Not given. 

® Coefficient of multiple correlation. 

> U.S. import price (with tariff) divided by U.S. cost of living. 

° U.S. export price divided by U.S. competitors’ export price. 

4 “1931 and 1932 are excluded because of the abnormal influence of world exchange depreciation.” 
(p. 149) 
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Chang derives the “cyclical pattern of American income account,” 
expressed in millions of dollars of change, by applying these elasticities 
to average values in the base period 1925-1926. For a one per cent rise 
in world real income (and a corresponding 2.15 per cent change in 
U. 8. real income) he has: 


Imports —173.3 
Exports +195 .0 
Interest receipts + 18.9 
Interest payments — 7.9 
Other current items — 14.9 


Net change in balance + 17.8 
(All items rise; as usual, the minus signs represent debit items.) 


What we get out of this for each of the six countries is the usual direc- 
tion and average amplitude of fluctuation in each major item in its 
balance of payments relative to a given movement in the “world cycle” 
(or, if one wishes, in the country’s own real income), and some notion 
of the separate shares of price and quantity variation in import and 
export value change. 

In order to be able to say something about “typical” differences be- 
tween raw-material producers and industrialized countries, Chang sup- 
plements these calculations for the sample of six countries with a less 
detailed examination of the data for another 15 or 16 countries. For 
these countries he determines the first two of the equations given in 
the above list for the United States, and thus obtains the elasticities 
of quantity of imports with respect to real domestic income and rela- 
tive price (his Table 4) and of quantity of exports with respect to real 
world income and relative price (Table 6). All told, then, he has the 
factors determining (or associated with) the imports of 21 and the ex- 
ports of 22 countries, with 19 common to both lists. Again, the period 
is 1924-1938, the data are annual, and the trends have been eliminated. 

Chang finds, with respect to imports, that the price factor is of minor 
importance: all but one of the elasticities are below unity, and for 13 
countries they are below .5, neglecting the minus sign. These findings 
tell us something about the extent to which price fluctuations are asso- 
ciated with quantity fluctuations (income constant). Chang’s reading 
of them as demand elasticities in the neoclassical sense, however, raises 
questions of the sort that troubled economists in interpreting statistical 
demand curves during the 1920’s. Indeed, Chang’s interpretation of his 
results—reproduced from his earlier published articles—has already 
been criticized (see, for example, Guy Orcutt’s article in the May 1950 
Review of Economics and Statistics). 








1954 


it,” 
Hes 
‘ise 


C= 
is 





CYCLES IN THE BALANCE OF PAYMENTS 83 


Chang concludes that the important demand factor is income: all 
but one of the income elasticities are above unity, with those of “in- 
dustrialized” countries lower than those of agricultural and mining 
countries. That is, “given a uniforin economic expansion or contraction 
all over the world, the quantity of imports of the industrial countries 
tends to fluctuate less than that of the world average; and, that of the 
agricultural [and mining] countries tends to fluctuate more” (p. 43). 

With respect to exports, the price elasticity (again ignoring the minus 
sign) is above unity for 4 countries, between .5 and 1.0 for 9, and below 
5 for the remaining 9. Income elasticities are above unity for virtually 
all the industrial and mineral producing countries (Canada is classed 
here as mining; in Table 4 it was classed as agricultural) ; they are be- 
low unity for all the agricultural countries. “Looking at the results 
broadly, we find that Table 6 depicts a situation reverse to that of 
Table 4. The countries whose import income elasticity is less than 
that of the world as a whole are those whose export income elasticity 
is greater than that of the world as a whole; and conversely. Or, speak- 
ing more generally, for the former cases, the import income elasticity 
tends to be smaller than the export income elasticity ; whereas, for the 
latter cases, the export income elasticity tends to be smaller than the 
import income elasticity” (p. 51). 

Chang is rather careless here, for he is comparing the import elastic- 
ity of a country with respect to its own income, and its export elasticity 
with respect to world income. It is something of a jump to imply here, 
and assert explicitly later (p. 170), that “the difference in the magni- 
tude of import and export income elasticity of all the agricultural coun- 
tries tends to result in a large and unfavorable change in relative quan- 
tities in prosperity.” Except for the six countries studied in de‘ail 
Chang presents no data showing how the income of individual coun- 
tries fluctuates in relation to world income. However, the statement 
about change in relative quantites might be warranted. Agricultural 
output in the United States (and therefore presumably also purchases 
of such output by domestic “industry”) fluctuates within a narrower 
range than does mining and manufacturing output (and therefore, 
presumably, purchases of such products by farmers). This is probably 
true of trade within other countries, and possibly of international trade 
as well. 

Chang’s elasticity calculations relate to quantities, not values. Ex- 
cept for the six countries he provides no evidence on the cyclical fluc- 
tuations in terms of trade. However, it is pretty clear, again from in- 
formation outside of Chang’s book, that the terms of trade change in 
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favor of agricultural communities when business improves. Further, 
Chang shows (for eleven agricultural countries combined, p. 169) that 
the balance of merchandise trade was negative during 1924-1930 and 
positive during 1931-1938. Chang is led to assert, therefore, that 
change in relative quantities tends to more than offset relative price 
change (p. 170). 

The picture for industrial countries is the opposite, of course. In 
their case, the merchandise export balance rises as world business im- 
proves and falls as world business worsens. 

In the case of mining countries, Chang believes the change in relative 
quantities to be small; but the change in relative prices to be great 
and in favor of these countries as world business expands. Therefore 
their merchandise export balance tends to behave like that of industrial 
countries. 

If all this is true it means (ignoring minor items in the balance of pay- 
ments) that net capital export by the industrial and mining countries 
tends to be greater (or net capital import smaller), during world 
prosperity than during world depression, with the reverse for agricul- 
tural countries. These generalizations about changes in capital flows 
and associated changes in trade quantities, prices, and values during 
fluctuations in world business constitute Chang’s major findings. 

It is clear that the scope of Chang’s results is narrow and that we are 
far short of having all the answers to the questions listed earlier. But 
no investigator working alone could have gotten very far in filling that 
bill. 

Another criticism, however, must be made of the adequacy of his 
results. There, too, the conclusion is clear. We are not sure of the 
answers that Chang has given us. And the grounds for our doubts are 
much the same as those that inevitably restrict the scope of his results. 

One reason is the limited range of the data analyzed. Whatever ap- 
paratus is applied in the analysis, it cannot be expected to extract from 
short series of annual data reliable answers to our questions. It might 
perhaps be argued that the world economy of pre-1914 days was so 
different from that of the inter-war period, that we must make shift 
with the 1924-1938 data if our concern is with the workings of the 
economic world of that period; but this cannot be determined a priort, 
and I do not believe that it jibes with the known facts. Chang should 
not and need not have confined himself to the period 1924-1938. Nor 
should he have restricted himself to annual data, for annual data sup- 
press too many significant features of cyclical change. Limits on his 
time and energy would have forced him to be less ambitious in other 
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directions, but I suspect that he chose the inferior alternative. Study 
of a longer period might also have forestalled some of the very serious 
questions that arise about his trend eliminations. Chang tells us prac- 
tically nothing about them. Since we know that his period of fifteen 
years is short, and that it includes an exceptionally long cycle, we tend 
to imagine the worst about the adequacy of his trends. 

Even the short series might have yielded more than Chang squeezed 
out of them had he used some other apparatus of analysis. In effect, he 
made up a scatter diagram on_a double-log scale, plotting (say) exports 
against income, drew a regression line through the fifteen points in the 
scatter, read off the slope of the line, and discarded the basic data 
and the diagram. The regression coefficient (i.e., the elasticity) hardly 
tells all that might be learned from the basic data or even the chart. 
For example, is the slope greatly influenced by the extreme points; 
that is, is the slope a reflection mainly of a single large cycle—that of 
the 1930’s—or does it fairly reflect all the cycles including the (two) 
smaller ones? Chang provides enough scattered information to raise 
some serious doubts about this, but no systematic examination is made 
nor is enough information given to enable the reader to undertake it 
for himself. 

When three-variable scatters are used, another question arises. Mul- 
tiple correlation coefficients are usually given by Chang. However, the 
standard errors of the regression coefficients are conspicuous by their 
absence, and nothing is said about intercorrelation between factors. 
This is a serious deficiency in a context in which the problem of multi- 
collinearity is important. 

There is a question also about the use of straight lines on double-log 
paper. Do they always tell the story? They could not, for example, 
express adequately the relation between an asymmetrical cycle in one 
series and a symmetrical cycle in another. 

The reader will realize that I am suggesting the use of some such 
apparatus as Mitchell’s in studying cycles; but was not this apparatus 
designed by an expert for the very purpose? 

Chang assumes that a rather tightly-knit world economy existed in 
the inter-war period. This is a basic point, for his major finding is in 
terms of a world cycle. He notes at one place (p. 15) that the timing 
and intensity of cyclical fluctuation in the different countries are not 
the same, but on the next page dismisses this as unimportant. He seems 
to have been a victim of his choice of data and method of analysis, 
which led him to assume the world-wide diffusion of the depression of 
the 1930’s to be characteristic of cycles generally. (A. F. Burns points 
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out that “after 1919 the business cycles of different countries tended to 
drift apart, though practically all shared in the catastrophic contrac- 
tion of 1929-32;” see Papers and Proceedings, American Economic Re- 
view, May 1949, p. 82.) Careful examination even of that episode might 
have raised some doubts in his mind about the assumption of a closely 
integrated world economy. By failing to present data for more than the 
six countries on how the income of individual countries fluctuates in 
relation to world income, he fails to provide the basis for such an 
examination, and fails to present convincing evidence for his assertion 
that the “trade cycle is a world-wide phenomenon” (p. 220). Indeed, 
we know that the peak even before the contraction of the 1930’s came 
at different times (even on an annual basis—see Thorp’s business an- 
nals for 1926-1931, News-Bulletin of the National Bureau of Economie 
Research, Sept. 1932) ; and Chang’s equations for the six countries sug- 
gest how greatly contractions have varied in severity (the “elasticity” 
of national real income with respect to world real income ranged from 
.58 for the U. K. to 2.15 for the U. 8.). 

As part and parcel of the above assumption Chang assumes that 
there is a single world market for all export countries and that world 
income is the dominant demand factor with respect to each. However, 
the “world market” is a group of markets, closely interrelated for some 
commodities, loosely for others: recall A. J. Brown’s observations (in 
Chapter VI of his Applied Economics), and Chang himself notes that 
countries have “exclusive” markets and that “world market is an am- 
biguous notion” (pp. 53, 70). Since the incomes of the various importing 
countries do not in fact fluctuate identically, we cannot expect that all 
exporting countries will be confronted with the same demand condi- 
tions. The aggregate of world income is therefore hardly an appropriate 
measure of the strength of the demand confronting any individual 
country. But even if there were a world market, Chang’s simple ag- 
gregate of national incomes could not be an adequate measure because 
it makes no allowance for international differences in import propensi- 
ties. 

Chang also ignores an important factor on the supply side. Agricul- 
tural output is influenced greatly by the weather. Because this varies 
over the surface of the earth, individual countries here and there will 
enjoy a bountiful harvest or suffer a drought, although world agricul- 
tural output will be fairly steady. The value of a country’s agricultural 
exports might therefore be high when that of other countries is low, 
low when that of other countries is high. (And this in turn, by influenc- 
ing domestic business conditions outside the agricultural sphere [see 











CYCLES IN THE BALANCE OF PAYMENTS 87 


Mitchell’s What Happens during Business Cycles, p. 58, footnote, and 
his earlier 1913 report ], will contribute to diversity of national income 
experience among the countries of the world.) 

There are some questions, finally, about the accuracy of Chang’s 
basic data, as well as the combinations he made of them. Students of 
income statistics will wonder about the adequacy of the real income 
estimates for the score of countries covered. Others will raise their eye- 
brows at some of the series on quantum and price of imports and ex- 
ports. Chang mentions sources but does not go sufficiently into the de- 
tails of the construction of the estimates or their adequacy for his 
purposes; nor does he indicate what change in his findings might result 
were use made of alternative estimates, where these are available. 

We are left, at the end, with doubts not only about the applicability 
of Chang’s findings for 1924-1938 to other periods, but also about their 
adequacy for the period he covered. 

The chapter in Viner’s book to which reference was made above 
opened with a statement unearthed by him from a work published in 
1857: “Many writers have perplexed themselves and their readers by 
founding theories on exceptional circumstances. Others have been led 
astray by statistics—the characteristic form of modern research.” 
Chang’s book illustrates both dangers. Yet we should not forget that 
his study is the first attempt at a systematic survey of cyclical move- 
ments in the balance of payments of a wide variety of countries. His 
energy—even his boldness—in grappling with stubborn facts sets us 
an example. While we cannot follow in his footsteps all the way, we 
may discover the right path more easily because of his pioneering ef- 
forts. 








DEMAND ANALYSIS* 


H. 8S. HouTHAKKER 
University of Chicago 


N A field where papers are plentiful but books are rare the appearance 
i] of a monograph by one of the leaders in its development raises high 
expectations. Professor Wold is perhaps best known as a mathematical 
statistician, but he has also made valuable contributions to economic 
theory [15] and his earlier empirical study on the demand for farm 
products [14] would no doubt be generally recognized as a classic if 
its language and time of publication had not curtailed its circulation, 
For the present work he associated himself with Mr. Jureén, a govern- 
ment statistician; he has also drawn upon the support of a number of 
other distinguished collaborators in Sweden and elsewhere. It should 
be said at once, lest the following criticisms obscure the appreciation, 
that the hopes thus raised are not. disappointed. Demand Analysis is a 
highly instructive and provocative work that no economist or statisti- 
cian could consult without profit, and for specialists it is indispensable. 

In common with other branches of econometrics the study of con- 
sumers’ behavior requires knowledge of the relevant chapters of eco- 
nomic theory and statistics in addition to skill in the interpretation 
and utilization of observational data. The book under review is organ- 
ized around these three elements. After a first part which surveys the 
subject and summarizes the results there follow sections on the theory 
of choice, on stochastic processes, and on regression analysis: finally, 
some empirical investigations of the demand for foodstuffs in Sweden 
are discussed. Three of the five parts are completed by numerous exer- 
cises, some of which contain interesting new results. 


ECONOMIC THEORY 


The purpose of theory in an “applied” subject consists mainly in the 
formulation of (a) concepts in terms of which observations can be use- 
fully described and (b) theorems, phrased in those terms, that allow 
statements about situations for which adequate observations are lack- 
ing. The usefulness of these concepts depends on their maintaining a 
certain invariance between different situations; thus, if the quantities 
bought by consumers showed little or no correlation with prices and 
incomes the concept of a “demand function” would not be a useful one. 





* A review article of Demand Analysis: A Study in Econometrics, by Herman Wold in association 
with Lars Jureén. New York: John Wiley and Sons, Stockholm: Almquist and Wiksel, 1953. Pp. xvi, 
358; $7.00. 
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Professor Wold is therefore rightly concerned to show from empirical 
evidence that demand functions, which are the cornerstone of consump- 
tion theory, do possess such stability. To what extent he has in fact 
shown this is a question to which we shall have to return. 

The formulation of demand functions is not the ultimate aim of 
what the author describes as the “Paretoan” theory of consumer de- 
mand, though it was as far as the “demand function approach” (G. 
Cassel [3]) was prepared to go. In order to state theorems about these 
functions additional assumptians have to be made. These assumptions, 
which have been expressed in different ways, link the demand functions 
for an individual consumer with his preferences for various collections 
of goods. The now classical approach, due to Pareto and also favored 
by Wold, attributes to the consumer a consistent preference ordering 
for all such collections. A more recent version, advanced by Allen [1], 
assumes that the consumer only compares collections that are very close 
to each other;! this, however, is not a genuine generalization, for as 
soon as there is a finite difference between compared collections a chain 
of comparisons can be made and we are back to the preference ordering 
approach. A third approach, originally proposed by Samuelson [10], 
expresses consistency of preferences directly as a property of the de- 
mand functions; the reviewer has shown [6] that this “revealed prefer- 
ence” approach, when appropriately formulated, is also equivalent to 
the classical approach. 

It has appeared worth while to go through these theoretical points 
because Professor Wold’s discussion may easily mislead the unsuspect- 
ing reader. His theorem 4.6.1, in fact, seems to assert not only the 
equivalence of the marginal substitution and preference ordering ap- 
proaches, but also of the latter and Cassel’s demand function approach. 
In other words, the author does not regard the assumption that demand 
functions are derived from consistent preferences as an additional one; 
in his view these functions, lest they be “self-contradictory,” must 
always satisfy the so-called “integrability condition,” which expresses 
this consistency. Similarly, he declares the revealed preference ap- 
proach to be merely a variation of the demand function approach. In 
doing so he misinterprets both, for it has been shown [6] that the strong 
axiom of revealed preference, which is certainly not satisfied by an 
arbitrary set of demand functions, is a necessary and sufficient condi- 
tion for the existence of a consistent preference ordering. Using words 





1 Wold ascribes this “marginal substitution” approach also to Hicks, but this is very questionable. 
Even as early as the well-known pair of papers by Hicks and Allen of 1934 [5] a discrepancy between the 
views of the two authors was evident. 
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in their usual meanings there is nothing “se]f-coatradictory” in a set of 
demand functions for which the integrability condition does not hold; 
all one can say is that the notion of preference does not apply to such a 
set. Wold’s theorem 4.6.1 is therefore based on a petitie principit. If it 
holds true for the marginal substitution approach, this is only for the 
reasons given in the previous paragraph, and not because of Wold’s 
circular argument. 

In any review more space is inevitably devoted to criticism than to 
commendation, and we add at once that apart from this slip? the au- 
thor’s exposition of the preference ordering approach in Chapter 4 is 
lucid and original. Chapters 5, 6, and 7, dealing with the specification 
of demand patterns, relations between demand elasticities and market 
demand, lift consumption theory above the formal level on which it is 
too often discussed. Still more stress might have been laid, however, on 
the preponderance of corner equilibria and the resulting restrictions on 
the validity of traditional calculus methods. Curiously enough the 
author has failed to see that Hicks’ method of deriving market demand 
is exactly the same as his own, so that his criticisms are unfounded. 
The exercise 2.27, which is claimed to illustrate Wold’s objection, is 
highly instructive nevertheless. In Chapter 8 applications of preference 
theory to the supply of labor, to barter and to price index numbers are 
discussed. 

Although Professor Wold points out and evaluates some of the 
limitations to Paretoan demand theory resulting from its static, non- 
stochastic and individualistic character, there is very little discussion of 
a possibly more serious complication, viz. that arising from consumers’ 
assets. These assets, particularly in the form of durable consumption 
goods, lead to indivisibility problems and to the explicit introduction of 
time into the budget (see the recent work of Theil [13] and also Bould- 
ing [2]). Formally these problems may be covered by an extension of 
Wold’s axioms, but in practice this is not very helpful; in fact the most 
difficult and interesting questions of theoretical and empirical demand 
research are precisely in this area. In a book entitled Demand Analy- 
sis readers should at least have been made aware of this field and 
referred, for instance, to the investigations of De Wolff [4] and Roos 
and Von Szeliski [9] on automobile demand. The preoccupation of the 
empirical chapters with food demand, though otherwise understand- 
able, may also leave students with exaggerated notions as to the scope 
of a purely static approach. 





2 It might have been avoided if the author had paid more attention to Samuelson’s questioning [11] 
of a similar theorem in [15]. As will be seen from the above Wold is also incorrect in describing the result 
in [6] as mathematically equivalent to his own assertion on the demand function approach. 
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STATISTICAL METHODS 


The statistical controversies in demand analysis are perhaps of more 
importance than the theoretical questions reviewed above, since the 
former have a greater bearing on the numerical results obtained. Early 
in the history of econometrics it was recognized that a statistical ap- 
paratus designed mainly for biological experiments may not be entirely 
reliable in the analysis of economic observations. Two problems in 
particular have given rise to an extensive literature: serial interdepend- 
ence in time-series and estimation in a system of simultaneous rela- 
tions. On both of these topics, as well as on some related ones, Professor 
Wold has much of interest to contribute. 

His discussion of time-series problems is based on a condensed but 
brilliant exposition of the theory of stationary processes, including a 
description of recent work by P. Whittle. He uses these results to show 
that in certain important cases classical least-squares methods retain 
their optimal properties in large samples, though the traditionally cal- 
culated standard errors may no longer indicate the goodness of fit. The 
special case to which the author devotes most attention is that of a 
recursive system. This is a system of behavior equations which can be 
solved successively rather than simultaneously. An example is the “pig 
cycle” model 


d, = D(p.) & = S(pi-r) Pe = Pi-i + (din bay St-1) 


where d, is demand, s, supply, and p; price at time ¢. 

Professor Wold maintains—or at any rate leaves his reader with the 
impression—that such systems can be almost universally applied in 
economic analysis. It is not easy to summarize his argument, which is 
scattered over several chapters and appendix notes and sometimes 
hedged by qualifications that are not mentioned at a later stage. How- 
ever, there can be no doubt that Wold’s emphasis on recursive systems 
is an essential element in his sceptical attitude towards the simultane- 
ous equations approach developed by Haavelmo, Koopmans, and 
others ([7], [8]). As is well known, the difficulties which prompted the 
latter development do not exist in recursive systems; more particularly, 
the coefficients in all their equations are identified and their least- 
squares estimates are asymptotically unbiased. 

There are clearly desirable properties, and it is therefore necessary 
to consider how wide the scope of recursive systems in fact is. Strictly 
speaking they are indeed universal: there is every reason to believe that 
the economy, to use an anthropomorphic simile, solves its simultaneous 
equations by trial and error. Individuals adjust their actions to 
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parameters which they regard as fixed, but which are subsequently 
themselves affected by these actions so that new adjustments are re- 
quired. Static models in which equilibrium values determine each other 
without sequence or lags are an abstraction, and have been recognized 
as such for almost as long as they have been used. This does not mean 
that they are useless, nor even that they are less realistic than recur- 
sive systems, in which such lags occur explicitly. 

The problem here is that these lags are of very different length, rang- 
ing from a few seconds for the response of share or commodity prices 
to shifts in excess demand all the way to several years for the produc- 
tion of new ships or roads. In recursive systems of the kind described 
by Wold, however, lags have to be integral multiples of some unit pe- 
riod, usually the period to which the observations refer. If, as is usual 
in econometrics, annual observations have to be used, a one-year or 
two-year lag can easily be fitted in, but the question of what exactly 
should be done with lags of other lengths has not received Professor 
Wold’s attention. In the absence of such a discussion his stress on re- 
cursive systems does not carry complete conviction. 

We may perhaps detect here, as in other places, the results of an 
insufficient analysis of the use of approximative theoretical models in 
empirical research. The logic of regression analysis is treated with its 
application to experimental data as a prototype; with the aid of this 
interpretation rules for the selection of dependent variables are given. 
There is much play with the notion of causality, even though it has long 
lost its former pre-eminence in the physical and biological sciences and 
has never been very popular with economists, who tend to think in 
terms of functional rather than causal relationships.* Its introduction 
in any case hardly helps to bring out the difficulties peculiar to infer- 
ence from non-experimental observations, arising mainly from the fact 
that the latter usually have to be taken as they come and are available 
in limited number only. Models therefore have to be chosen with 
reference to the data with which they are to be used. The resulting 
problem of how to choose between models is nowhere faced squarely in 
Wold’s work, though there is a somewhat inconclusive discussion of the 
effects of additional regressors on the estimates of parameters already 
taken into account. 

What is lacking, to put it in other words, is an adequate treatment 
of small-sample estimation. This compaint is of course not addressed 
to Professor Wold alone, for after Student’s and Fisher’s classical con- 





3 Recently an attempt to rehabilitate the notion of causality has been made by Simon [8], whose 
point of view is very similar to that of Wold. 
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tributions progress in this important area has been disappointing. Con- 
temporary interest in estimation problems seems to be mainly centered 
on estimators with various asymptotic properties whose practical use- 
fulness is often hard to see. Under these circumstances the author is 
unfortunately right in stating that for small samples we have to be 
satisfied with “the rough inference drawn by the use of large-sample 
methods,” but this should not imply an abandonment of the search for 
more appropriate procedures. 

These comments are mostly elicited by Wold’s stimulating Chapter 
2, in which he attempts to show that least-squares regression, despite 
the objections from Oslo and Chicago, is “essentially sound.” The 
words in quotes are in fact characteristic of his attitude of militant con- 
servatism on many points of controversy in statistics and elsewhere. 
Not all readers will find their doubts quieted by the forceful but occa- 
sionally one-sided array of arguments, but they will learn a great deal 
from trying to refute them. 


EMPIRICAL FINDINGS 


The empirical part, for which Mr. Jureén was jointly responsible, 
describes an extensive investigation of food demand in Sweden under- 
taken in connection with an inquiry into the long-term position of 
Swedish agriculture. Both family budgets and market statistics are 
used as source material. In line with Professor Wold’s views on statisti- 
cal methods as discussed above only a few technological innovations 
are to be noted. The numerical results, subject to the validity of the 
techniques employed, are on the whole very reasonable and their dis- 
cussion, again with this proviso, is competent and illuminating. Our 
qualification refers to the authors’ neglect of the supply side when deal- 
ing with demand equations, but this neglect is of course deliberate. 

The work on family budgets (Chapter 16) is based on three Swedish 
surveys dating from 1913, 1923, and 1933. It is shown that estimates of 
national food consumption obtained by blowing up averages from the 
latter survey agree fairly closely with independent estimates of market 
demand. This is the more remarkable because the sample was not ran- 
dom but voluntary and participants had to keep detailed accounts for 
a whole year. It is in fact by no means clear that the voluntary ap- 
proach is really inferior to the random sampling methods (with inevi- 
tably low response rates) currently in vogue. 

From these budgets income elasticities are estimated for a consider- 
able number of commodities and family types, quantity and expendi- 
ture elasticities being distinguished. In most of the analyses constant- 
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elasticity formulas are applied, but since the results reveal that the 
elasticities are not independent of income, some use is also made of 
the group of formulas suggested by Térnqvist. No standard errors of 
the estimates are calculated, on the ground that they are not theoreti- 
cally justified for this material. 

Earlier in the book (Chapter 14) there is a short discussion of equiva- 
lent adult scales where a method for determining the weights is pro- 
posed. This consists in calculating income elasticities in two ways: by 
pooling the separate estimates for different family types and by deriy- 
ing a joint estimate for all households after their expenditures have 
been divided by the relevant number of equivalent adults. If the scale 
is correct, the two calculations will yield the same result. This agree- 
ment, however, is in general only a necessary and not a sufficient con- 
dition for correctness. Except in the case where only two kinds of per- 
sons (children and adults, for instance) are taken into account no 
unique scale will be obtained by Wold’s method, and in practice it is 
desirable to specify many more categories of persons. Moreover, al- 
though Wold recognizes that the scale for total expenditures should 
be different from the scales for particular items this point does not 
seem to be allowed for in his method of computation. 

Professor Wold’s views on the relation between the income elastici- 
ties estimated from family budgets and those estimated from time series 
are also worth noting, especially since he was (in [14]) the first to com- 
bine the two sources in the manner now widely adopted. He distin- 
guishes between short term and long term elasticities and maintains that 
the two sources both estimate the latter variety, which is usually the 
more interesting one. Because of the continuous introduction of new 
commodities, however, he thinks that the elasticities obtained from 
budget data on the whole tend to be smaller than those that refer to 
market statistics. This is an interesting observation, but it is not so 
clear that the time-series elasticity is really a long term figure and this 
somewhat weakens the author’s conjecture. 

In their work on market statistics (Chapter 17) Messrs. Jureén and 
Wold frequently use “conditional” regression analysis with income 
elasticities inserted as if they were known from other sources; they do 
not always use the estimates obtained from budget data but supple- 
ment these by “common sense” arguments. Sometimes they fix the 
price elasticity instead of the income elasticity. Standard errors of the 
estimates are calculated by means of a new formula which allows for 
autocorrelated disturbances. The symmetry of cross-price elasticities 
according to Slutsky and Hotelling is tested; it is found to hold rather 
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strikingly in the case of pork and beef, but much less so for animal and 
vegetable foodstuffs. 

Some more general points in market demand are discussed in Chap- 
ter 15. Wold there pronounces himself against trend removal but in 
favor of deflating prices and incomes by a general price index. With the 
first advice the reviewer agrees, and, at any rate in the case of food- 
stuffs, also with the second. The author’s argument on the latter ques- 
tion, however, is very superficial. He recommends deflation to correct 
for “changes in the monetary unit,” and refers to Schultz [12] in sup- 
port. Schultz, on the contrary, advocated deflation because it increases 
the degrees of freedom by one even though there is no exact method of 
taking changes in other prices into account. This is a much sounder 
argument, which shows incidentally that deflation is not invariably 
appropriate whereas Wold implies that it should always be applied. 

In the final Chapter 18 the most interesting contribution is a detailed 
forecast of 1949-50 food consumption on the basis of pre-war demand 
functions tested against actual consumption in the forecast period. It 
is shown that the forecasts are on the whole reasonably accurate, and 
that in most cases they are nearer to the observations than the pre-war 
average on which a “naive” forecast might be based. It would be cap- 
tious to deny that this satisfactory result speaks well for the validity of 
the methods used, whatever doubts one may have about their theoreti- 
cal justification. 


CONCLUSION 


According to the preface Demand Analysis “is written in the dual 
form of a research report and a specialized textbook of econometrics.” 
There are advantages and dangers in such a combination, particularly 
for the textbook half, and both are conspicuous here. The main advan- 
tage is that the methods discussed can be illustrated by actual applica- 
tions though this is perhaps more effective if these applications belong 
to several fields instead of to a single rather narrow one as is the case 
here. Moreover readers will look in vain for actual applications of most 
of the economic theory and much of the statistics discussed in the ea lier 
parts of the book. The dangers of the dual form are even more appar- 
ent. It has been pointed out already that the preoccupation with food 
demand in the empirical sections has led to an unfortunate neglect of 
dynamic factors. If industrial commodities had been studied as well as 
agricultural ones, there might also have been less inclination to ignore 
the complications due to simultaneous equations. 

As a textbook Demand Analysis therefore has serious limitations, 
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which means that its use as such requires a considerable amount of 
additional explanation and amendment. The arrangement of the mate- 
rial could also be improved and much repetition eliminated. Professor 
Wold could hardly be blamed for not writing a standard work, since 
the subject is still too young to admit of one, but he would have come 
closer to writing one if he had been as successful in interpreting the 
ideas of others as he is in expounding his own. What we have here is 
essentially an admirable statement of the opinions and methods favored 
by one expert for his own research interests, and as such the book is 
an occasion for unqualified gratitude. 
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The problem discussed is that of sampling from continuous 
and discrete uniform distributions. An application of this 
problem is presented which deals with the analysis of serial 
numbers on manufactured items in order to estimate the total 
number of items manufactured. Estimates of bounded relative 
error are obtained. Some justification for the use of these 
estimates is presented from the loss (vost) function point of 
view. Confidence intervals for the parameters are obtained 
and graphs are presented which may be used to determine the 
sample size required for confidence intervals of a given ex- 
pected relative length. Tests of hypotheses are discussed. A 
method is presented for determining whether the serial num- 
bers obtained are a random sample from a population of con- 
secutive serial numbers.! 
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1, INTRODUCTION 


HE analysis of serial numbers has several practical applications, 
We shall describe two such uses. The interested reader will no 
no doubt think of still other applications. 

a) A commercial company could use the methods of serial number 
analysis in order to estimate the production and capacity of its com- 
petititors. Representatives from the company could obtain the serial 
numbers of showroom equipment as well as equipment in use which has 
been produced by the competitors. Many of the basic methods have 
been developed for analyzing the serial numbers obtained by the com- 
pany representatives (see [3]). 

b) An organization has been using equipment which was purchased 
many years ago. The question was raised as to how many pieces of 
equipment had been purchased. No records were immediately available 
to determine the total purchase, since the purchase had been made 
years ago. Since serial numbers had been placed on each piece of equip- 
ment at the time of purchase, the serial numbers obtained from a 
sample of the equipment could be used to estimate the total purchase. 
Section 5 describes how this method was used to estimate the total 
number of pieces of equipment (desks, bookcases, etc.) which were 
purchased for the Division of the Social Sciences, The University of 
Chicago. 


2. SUMMARY 


Some of the practical problems which are of importance to organiza- 
tions using “serial number analysis” will be considered here. 

The arithmetic involved in the analysis of serial numbers seems to 
be simpler if the unknown production p is “assumed so large that 
variation is continuous” (see [3], p. 629). Some results for the “con- 
tinuous variation” case will be presented which will serve as an approx- 
imation to the exact results. Some exact results will then be discussed. 

The problem of obtaining confidence intervals for the total produc- 
tion p is studied. The sample size necessary to obtain confidence inter- 
vals of a given average relative length is determined. The power of 
tests of hypotheses concerning the true value of the production is also 
examined. 

Rather than use an estimate of the production p which is unbiased 
or which minimizes the average of the squared error (see [3]) it might 
be desirable to have an estimate of which we are “almost certain” that 
it will be no more than, say, 1.2 times p and no less than, say, 0.8p. 
The estimate which maximizes the probability of being included in the 











SERIAL NUMBER ANALYSIS 99 


desired interval may be determined. For example, if d is the difference 
between the largest and smallest serial number in a sample of thirty- 
one serial numbers, then we can be “99.99% confident” that the esti- 
mate 1.20d will be between 0.8p and 1.2p. In other words, we can be 
“99.99% confident” that the relative error of the estimate 1.20d of p 
is less than .2. Justification of the use of such estimates of “bounded 
relative error” is presented within the framework of the theory of 
statistical decisions. 

A method is also presented for testing the basic assumptions made 
in serial number analysis by examining the serial numbers which have 
been obtained. It is possible to test the hypothesis that the serial num- 
bers obtained are a random sample. This method may also be used to 
detect whether there is a change in the procedure of serial numbering. 

An application of the methods described herein is discussed in the 
final section. 


3. CONTINUOUS VARIATION 


In this section we shall assume that the serial numbers have a con- 
tinuous uniform distribution between the initial serial number s and 
the final serial number s+-p, where the total production p is unknown. 
Both the case when the initial serial number s is known and also when 
it is unknown will be considered. 


3.1. Initial Number Known 


When the initial number s is known, we might subtract s from each 
serial number obtained. The serial numbers (after the subtraction has 
been made) will then be uniformly distributed between 0 and p. The 
production p will be estimated using a sample of n serial numbers. 

3.1.1. Confidence intervals. Let us first consider the problem of ob- 
taining confidence intervals for p. If g is the largest serial number ob- 
served, suppose we state that “the total production p is between g and 
ag,” where a is some constant greater than 1. Then the probability that 
this statement will be incorrect is 1/a*. That is, such a statement will 
be incorrect if and only if ag<p(g<p/a). If n=1, the probability that 
g9<p/a is {y?/*dx/p=1/a. Since each observation is independent, the 
probability that all observations, and therefore g in particular will be 
less than p/a is 1/a*. This probability 1/a"=a of making an incorrect 
statement may be made small by choosing a large value for the con- 
stant a, or by obtaining a large sample of n serial numbers. We might 
first determine how small the probability a of making an incorrect 
statement should be, and then determine a or n from the relation 
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a=1/a". The interval “g to ag” in which it is stated that p lies is called 
the “(l1—a)-100% confidence interval” since the probability is 1—¢ 
that the statement will be correct. 

The length of the confidence interval in which it is stated that p lies 
is ag—g=g(a—1). Since the expected value of g is pn/(n+1), the ex- 
pected length of the interval is pn(a—1)/(n+1). The expected relative 
length of the interval is n(a—1)/(n+1) =X. We might first determine 
how small the probability a of making an incorrect statement should 
be and also how small the expected relative length » of the confidence 
interval should be. The sample size n of the serial numbers may then 
be determined by the relations 


n(a— 1)/(n+1) =X and a=1/a" or 
a=A+1+)A2 and a=a*, where z= 1/n. 


For any given values of a and X, graphs of the functions \+1+Az and 
a~* can be drawn. The value 2» of x where the two graphs intersect is 
then the desired solution of the last two equations. The reciprocal 
1/xo=Mo of this solution is the desired sample size. If then mo serial 
numbers are obtained, we will have (l—a)-100% confidence in the 
statement that “p lies between g and ag.” The expected relative length 
of this confidence interval is the desired value X. 

It is interesting to note that among all (1 — a) - 100% confidence inter- 
vals of the form “a:g to a2g,” where 1 Sa;<a2, the confidence interval 
with the smallest average length is obtained by taking a,;=1, which is 
what we have done. 

3.1.2. Testing hypotheses. Let us now consider the problem of testing 
the hypothesis that the total production is a given value po. This 
hypothesis will be rejected when the given value po does not lie within 
the confidence interval. In other words, having observed a sample of 
serial numbers, we make a confidence statement that “p is between g 
and ag,” and reject the “null” hypothesis that the total production is a 
given value po if this value lies outside the confidence interval. The 
probability is a=1/a” of rejecting this hypothesis when it is in fact 
true. We should like the probability of rejecting the null hypothesis 
(that the total production is pp») to be large, when the hypothesis is in 
fact false (i.e., when the total production is a value p different from 
po). This probability 1—8 of correctly rejecting the null hypothesis, 
when in fact the true production is p, may be determined by the fol- 
lowing formula: 





SERIAL NUMBER ANALYSIS 
. when p< po/a 
1 — B(p) = \a(po/p)", when po/a Sp &S po 
1 — (1 — @)(po/p)", when p> po. 


We call 1—8(p) the power function of the test. 

The formula for the power function 1 —8(p) follows directly from the 
following considerations. The null hypothesis that the total production 
is a given value po will be rejected whenever po<g or po>ag. But 
q<po/a if and only if all observations are less than po/a. The probabil- 
ity that an observation will be less than po/a is po/ap, when in fact the 
true production is p>po/a. Hence the probability that all observations 
will be less than po/a (i.e., g<po/a), is (po/ap)" = (po/p)"a if p>po/a. 
If p<po/a, rejection of the null hypothesis is certain since g S p< po/a. 
The probability that at least one observation will be greater than po 
(i.e., > po) is zero for p< po, and it is 1—(po/p)" for p> po. From these 
conclusions the formula for the power function follows directly. 

We might first determine how small the probability a of incor- 
rectly rejecting the null hypothesis should be and also how large the 
probability 1—8 should be of correctly rejecting the null hypothesis 
when a particular alternative hypothesis p=, (different from po) is 
true. If the alternate hypothesis p=; has been specified the appropri- 
ate sample size of the serial numbers required can be determined by 
solving the equation 


1—f6=1— B(p) 
for the value of n. For example, if pow Spi Spo, then 
1 — B = a(po/p:)" 
(1 — B)/a = (po/p:)” 


n = log [(1 — 8)/a]/log [po/pi]. 


3.1.3. Estimates of bounded relative error. In [3], the problem of point 
estimation of p was considered and the unbiased estimate of p which 
had the smallest variance was given. The relation between this unbiased 
estimate and various other point estimates of p was examined. The 
problem of point estimation will now be considered from a somewhat 
different point of view. We might want to be “almost certain” that the 
estimate of production p obtained from the sample of n serial numbers 
will not be more than 1.2 times as large as the true production p, and 
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will not be smaller than 0.8p. If the estimate is of the form cg, where 
c21 is a constant and g is the largest among the n serial numbers, 
then the probability that the estimate cg will lie between 0.8p and 1.2p 
is 


(1.2/c)" — (0.8/c)", when c2 1.2 
and 
1 — (0.8/c)", when c 3 1.2. 


Hence the probability that cg will lie between 0.8p and 1.2p is maxi- 
mized when c= 1.2 and, in that case, the probability is 


1 — (0.8/1.2)". 


The sample size n necessary in order that we can be “(1—a)-100% 
confident” that 1.2g lies between 0.8p and 1.2p is determined by the 
relation 


1 — a = 1 — (0.8/1.2) 


n = logta/log (0.8/1.2). 


It may be desirable to determine an interval cig to cog(1 Sci See) of 
which we can be at least “(1—a)-100% confident” that any given 
estimate of the form cg in that interval (c; Sc Scz) will lie between 0.8p 
and 1.2p. In order to obtain such an interval, it is clear that the sample 
size n must be greater than log a/log (0.8/1.2). In that case, the values 
of c; and c, are determined by 


1—a=1 — (0.8/c,)" 


and 


1 — a = [(1.2)" — (0.8)"]/c", since c Sc XS co. 


We might wish to determine an interval c3g to cg of which we can 
be “(1—a)-100% confident” that the entire interval will lie between 
0.8p and 1.2p. If n>log a/log (0.8/1.2), appropriate values of c;<1.2 
and c,>1.2 can be determined by the relation 


(1.2/c4)" — (0.8/cs)” = 1 — a. 


More generally, if an estimate cg is desired which maximizes the 
probability of being included between kip and kep (where the k’s are 
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given constants such that k,<kz) then the estimate should be hep. If 
the sample size n is greater than log a/log (k;/kz), then the probability 
is at least 1—a that any given estimate of the form g times a given 
constant in the interval cg and cg will lie between kp and k,p, where 


cy" = ky"/a 


and 
ca" = [ke — ky" |/(1 — a). 


Also, the probability is 1—a that the entire interval cag to cg will lie 
between kip and kep where 


(ke/cs)" — (ki/es)" = 1 — a. 


In practice it may sometimes be possible to determine the constants 
k, and kz so that if the estimate / of p is between k,p and kep it will be 
“close enough.” By “close enough” we mean that no loss is incurred 
when an estimate # of p is made which is between kyp and kyxp. When 
the estimate # is not between k,p and kep, then the loss incurred in 
using an estimate which is not “close enough” may be some given con- 
stant, say, 1. If the loss incurred in estimating p by / may in fact be 
described by the function 


0 when kip < p < kop, 
1 otherwise, 


Lib, ») = { 


then the estimate which maximizes the chance of being included be- 
tween kp and kp also minimizes the expected loss. Hence the estimate 
keg which maximizes the chance of being included between kip and kep 
may be justified within the framework of the theory of statistical 
decisions. For a more general discussion of the problem treated in this 
paragraph the reader is referred to [2]. 

3.1.4. Tests of randomness and consecutive serial numbering. It has 
been assumed herein that the n serial numbers obtained are a random 
sample from all the serial numbers which are distributed uniformly 
(numbered consecutively) between the initial serial number s and the 
final serial number s+, where s or s+>:p (or both) may be unknown. 
Before applying the statistical methods which have been based on this 
assumption, it is desirable to examine the sample of n serial numbers 
and test whether this assumption is justified. That is, the hypothesis 
that the serial numbers were obtained from a random sample of n 
observations from a uniform distribution between s and s+ p should 
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be tested. The question “Are the serial numbers a random sample?” 
will be studied. 

When the initial serial number s is known, it has been assumed that 
the serial numbers (after s has been subtracted from each serial num- 
ber) are uniformly distributed between 0 and p, where p is unknown. 
The n serial numbers have been assumed to be a random sample of 
serial numbers. Let us now consider the problem of testing the hy- 
pothesis that the n serial numbers are a random sample. We note that 
the hypothesis to be tested is not concerned with determining the un- 
known true value of the production p. Several tests are available for 
the hypothesis that the n serial numbers are a random sample from all 
the serial numbers uniformly distributed between 0 and p, where p is 
not specified. Consider all n serial numbers obtained except the largest 
serial number g. If the hypothesis to be tested is true, then this sample 
of the n—1 smallest serial numbers will be uniformly distributed be- 
tween 0 and g, when g is given. Hence, dividing these n —1 serial num- 
bers by g, the numbers obtained will be uniformly distributed between 
0 and 1, when the hypothesis to be tested is true. In order to test the 
hypothesis of randomness, we might test whether these n—1 serial 
numbers (divided by g) are uniformly distributed between 0 and 1. This 
can be done using the Kolmogorov statistic or one of the other sta- 
tistics (e.g., chi-square, maximum difference, etc.) described in [1]. 
For example, if n =31, a graph of the sample cumulative distribution of 
the n—1=30 smallest serial numbers obtained (when divided by the 
largest serial number obtained can be drawn). The maximum absolute 
difference between this sample cumulative and the cumulative of the 
uniform distribution (the diagonal line) is then determined. From 
Table 1 (VN =30), on page 428 of [1], we find that the probability is 
.97745 that this maximum absolute difference between the cumula- 
tives will be less than 8/30. Hence, if a test is to be performed at the 
.02255 level of significance, we will accept the hypothesis of random- 
ness whenever the maximum absolute difference between the cumula- 
tives is less than 8/30. 

If the hypothesis of randomness is accepted, the analysis described 
in the preceding sections herein and in [3] could then be used. If the 
hypothesis is rejected, the sample of serial numbers should be ex- 
amined to determine what is nonrandom about it. On the basis of such 
an inquiry ad hoc methods for estimating the true production p could 
be determined. 

This approach may also be used to see whether there are changes in 
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the procedure of serial numbering. If the procedure changes (i.e., if 
the serial numbers are not uniformly distributed between the initial 
serial number and the final serial number), then a random sample of 
the serial numbers might indicate a nonuniform distribution. The test 
proposed in this section may be considered as a test of the hypothesis 
that serial numbering was done consecutively, as well as a “test of 
randomness.” 


3.2. Initial Number Unknown 


3.2.1. Confidence intervals. Let us first consider the problem of ob- 
taining confidence intervals for p. 

The probability that the difference d between the largest and smallest 
among the n serial numbers is greater than p/b(b21) may be deter- 
mined by the following relation (see [4], page 386): 


Pr {p 2d 2 p/b} = Pr {1 = d/p 2 1/b} 


= f re — 1)z*-*(1 — z)dz 


7) 
= 1 — nb'™ + (n — 1)b™ 


= Pr {d Sp S bd}. 


Suppose the statement is made that “the total production p is between 
d and bd,” where b is some constant. Then the probability a that this 
statement will be incorrect is nb'!-*+(1—n)b-"=a. This probability a 
of making an incorrect statement may be made small by choosing a 
large value for the constant b, or by obtaining a large sample of 7 serial 
numbers. We might first determine how small the probability a of 
making an incorrect statement should be, and then determine b or n 
from the relation a=nb!-"+(1—n)b-*. Tables are available which will 
simplify the computations (see [5], [6]). A reprint of [6] may be pur- 
chased from Biometrika. 

Let us illustrate the methods just described by a numerical example. 
If a is chosen equal to 0.05, the value of 1/b can be determined from 
the entries in column 4=1; on p. 174 of [6] where 2(n—1) =v2. If n=31 
serial numbers have been obtained, then 1/b is determined by the 
entry in the fourth column (v:=4) and third row from the bottom 
(ve =60) of the table on page 174 in [6]. Hence 1/b = .85591 and b=1.17. 
Upon observing 31 serial numbers, we will be 95% confident in the 
statement that “the total production p lies between d and 1.17d.” 
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The length of the 95% confidence interval for n=31 serial numbers 
is d(1.17 —1) =0.17d. Since the expected value of d is p(n—1)/(n+1),? 
the expected length of the interval is (0.17) p(n—1)/(n+1) =0.16p. 
The expected relative length of the interval is \=0.16. We might first 
determine how small the probability a of making an incorrect state- 
ment should be and also how small the expected relative length of the 
confidence interval should be. Then the relations 


(b—1)(n—1)/(n+1) =A and nb'"+(1—n)b*=a 


can be used to determine b and the necessary sample size n. Writing 
1/b=y and 1/(n—1) =z, the first relation can be replaced by 1/y=) 
+1+2dz. 

Other methods for determining n may also be used; e.g., successive 
approximation procedures. 

3.2.2. Testing hypotheses. The problem of testing the hypothesis that 
the total production is a given value po may be studied in the same 
way as was done in Section 3.1.2. Direct computations may be made 
for any test at a given level a of significance in order to determine the 
power function of the test. The tables in [5] and [6] may be used to 
simplify computation. 

3.2.3. Estimates of bounded relative error. Let us now consider the 
problem of point estimation of p from the same point of view as in 
Section 3.1.3. We might want to be “almost certain” that the estimate 
of p obtained from the sample of n serial numbers “will not be more 
than 1.2p nor smaller than 0.8p.” If the estimate is of the form cd, 
where c>1 is a constant and d is the difference between the largest and 
smallest among the n observed serial numbers, then the probability 
that the estimate will be between 0.8p and 1.2p is maximized when 


(1.2)"""(1 — 1.2/c) — (0.8)"""(1 — 0.8/c) = 0 
or when 
(1) c = [(1.2)" — (0.8)"]/[(1.2)"-1 — (0.8)"]. 


The sample size n necessary in order that we can be “(1—a)-100% 
confident,” that eg lies between 0.8p and 1.2p is determined by the re- 
lation 





2The reader will notice that the expected value of d presented on page 627 of [3] is 
(p+1)(n—1)/(n+1). The formula in [3] was derived for the exact model whereas the formula in this 
text is for the continuous variation model. Hence, d(n+1) /(n—1) —1 is the unbiased estimate of p in 
the exact model (see [3]) whereas the unbiased estimate of p for the continuous variation model is 
a(n +1) /(n—1). 
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l-a= | ee n(n — 1)z*-*(1 — z)dz 


-8/e 


(2) = Pr {0.8/c < d/p S$ 1.2/c} 


and relation (1). 

If the sample size is larger than the sample required by the preceding 
relations (1) and (2), two constants c;: Sc and c,>c can be determined, 
where c is defined by relation (1), such that we can be at least “(1 —a) 
-100% confident” that any given estimate of the form d times a given 
constant in the interval c,d to cd will lie between 0.8p and 1.2p. The 
values of c, and c, are determined by the relations 


1 — a = Pr {0.8/c, < d/p S 1.2/e} 
and 
l1—a=Pr {0.8/e1 Ss d/p}. 


It may be desirable to determine an interval c3d to cd of which we 
can be “(1—a)-100% confident” that the entire interval will lie be- 
tween 0.8p and 1.2p. When the sample size n is larger than the sample 
required by relations (1) and (2), appropriate values of cs<c and 
c4>c may be determined by the relation 


1 — a = Pr {0.8/c; S d/p S 1.2/e}. 


The numbers 0.8 and 1.2 can be replaced by k; and ke respectively in 
the preceding discussion to obtain more general results. A justification 
of estimates of bounded relative error may be presented, as was done 
in Section 3.1.3, within the framework of the theory of statistical de- 
cisions. The estimate cd which maximizes the chance of being included 
within kyp and kp is also the estimate which minimizes the expected 
loss if no loss is incurred when the estimate is within k,p and kxp and a 
constant loss is incurred otherwise. 

Let us illustrate the computations required in the preceding discus- 
sion by considering a sample of n=31 serial numbers. The value of c 
as defined by relation (1) is equal to 1.20 (to three significant digits), 
when n =31. Hence, the estimate 1.20 d maximizes the chance of being 
included between 0.8p and 1.2p. From the tables on page 54 of [5] we 
find that the chance is .9999 that 1.20d will lie between 0.8p and 1.2p. 

Suppose we wish to be 95% confident of all statements made, i.e., 
a=.05. The second column (p=30) of the table (¢=2) on page 54 of 
[5] presents the distribution of d. Using this infurmation together with 
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the entry in the eighth column (v,=60) and the fourth row (v2=4) on 
page 175 of [6], we see that cz is about 1.2/(1—.011585) = 1.21 (to three 
significant digits). Hence if the estimate of production p based on 31 
serial numbers is 1.21d, then the probability is 0.95 that this estimate 
will be between 0.8p and 1.2p. From the table on page 174 of [6] 
(v: =4, ve =60) we see that 


Pr {0.8 S d/p} > .95. 


Hence, we are at least 95% confident that any given estimate (of the 
form d times a given constant) in the interval d and 1.21d will lie be- 
tween 0.8p and 1.2p. We also find from the tables that the probability 
is about .95 that the entire interval d to 1.21d will lie between 0.8p and 
1.2p. 

3.2.4. Tests of randomness and consecutive serial numbering. Let us 
consider the hypothesis that the n serial numbers obtained are a ran- 
dom sample from the population of uniformly distributed serial num- 
bers. In the case where the initial number is unknown, we consider all 
n serial numbers obtained except the largest serial number g and the 
smallest serial number f. If the hypothesis to be tested is true, then this 
sample of n—2 serial numbers (all except g and f) will be uniformly 
distributed between f and g, when f and g are given. Hence, subtracting 
f from these n—2 serial numbers and then dividing the numbers ob- 
tained by g—f, the adjusted numbers will be uniformly distributed 
between 0 and 1, when the hypothesis to be tested is true. In order 
to test the hypothesis of randomness, we might test whether these 
n—2 adjusted serial numbers (when f is subtracted from the serial 
numbers and the numbers obtained are then divided by g—f) are uni- 
formly distributed between 0 and 1. This can be done using the Kolmo- 
gorov statistic or one of the other statistics (e.g., chi-square, maximum 
difference, etc.) as mentioned in Section 3.14. For example if n=31, 
the sample cumulative distribution of the n—2=29 adjusted serial 
numbers obtained can be graphed. The maximum absolute difference 
between this sample cumulative and the cumulative of the uniform 
distribution (the diagonal line) can then be determined. From Table 1 
(N =29) on page 428 of [1], we note that the probability is .98076 that 
this maximum absolute difference between the cumulatives will be less 
than 8/29. Hence, if a test is to be performed at the .01924 level of 
significance, the hypothesis of randomness and consecutive (uni- 
formly distributed) serial numbers will be accepted whenever the maxi- 
mum absolute difference between the cumulatives is less than 8/29. 
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4. THE EXACT MODEL 


In the preceding sections we have assumed that the serial numbers 
have a continuous uniform distribution between the initial serial num- 
ber s and the final serial number s+>p. This was done in order to sim- 
plify the problem and because for practical problems (when the value 
of p is large) the results obtained will serve as an approximation to re- 
sults for the exact model of a discrete, finite, uniform population (see 
[3]). 

On page 624 of [3], the exact confidence intervals and tests of hy- 
potheses are obtained for the case where the initial serial number is 
known. Since exact confidence intervals and tests of hypotheses were 
not discussed in [3] for the case where the initial serial number is un- 
known, we shall now consider that problem. 

From [3], we see that the probability that the difference d between 
the largest and smallest among n serial numbers will be less than or 
equal to a given constant c may be determined from the relation 


c 


Pr {d Sce| n, p} = D> nd — 1)°-%(p — d)/p™ 


d=n—1 
= ne/(p — 1) — (n — Ile + 1)/p™, 


where c™ =c!/(c—m)!. As a first approximation to this probability we 
might replace the exact model by the model of a continuous uniform 
distribution and obtain Pr'{dSeln, p} =n(c/p)*-!—(n—1)(c/p)” for 
which convenient tables are available (see [5] and [6}). 

Suppose we wish to test the null hypothesis that p= po against the 
alternative hypothesis p>o. Then the rejection region for a signifi- 
cance test at level a is obviously d>c,+1 where c; is the largest integer 
satisfying 


Pr {d Sa |n, po} <1—a. 


If we wish to test the null hypothesis p=po against the alternative 
hypothesis po, then the rejection region for a significance test at 
level a is d Sc where cz is the smallest integer satisfying 


Pr {d S c| n, po} > a. 


A two-sided test at level a of the null hypothesis p=po against the 
two-sided alternative p#po is defined by the acceptance region 
(Sd<po—1. A two-sided test at the 2a level might be based on the 
acceptance region cz S$dSc+1. 
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The results of the preceding paragraph may now be used to obtain 
confidence intervals. That is, the left-sided 1—a confidence interval ig 
p2=k,, where k, is the smallest integer satisfying 


Pr {d < do| n, ki} <l-a, 


and dp is the actual difference between the largest and smallest among 
the n serial numbers observed. The right-sided 1 — a confidence interval 
is p Ske, where kz is the largest integer satisfying 


Pr {d < do| n, ke} > a. 


A two-sided 1—a confidence interval is d+1SpSkz and a two-sided 
1—2a confidence interval is ki Sp Ske. 


5. AN APPLICATION 


The Division of the Social Sciences of the University of Chicago has 
been using equipment (desks, bookcases, etc.) upon which serial num- 
bers had been placed. The question was raised as to how many such 
pieces of equipment were there. 

The serial numbers on thirty-one pieces of equipment were observed. 
The 31 serial numbers obtained were: 

83, 135, 274, 380, 668, 895, 955, 964, 1113, 1174, 1210, 1344, 1387, 1414, 


1610, 1668, 1689, 1756, 1865, 1874, 1880, 1936, 2005, 2006, 2065, 2157, 2220, 
2224, 2396, 2543, 2787. 


The serial numbers range from 83 to 2787. The sample cumulative dis- 
tribution of the 29 serial numbers obtained between the smallest and 
largest serial numbers is graphed in Figure 5.1. The diagonal line in 
Figure 5.1 represents the uniform cumulative distribution between the 
smallest serial number 83 and the largest serial number 2787. From 
Figure 5.1 we see that the maximum absolute difference between the 
two cumulative distributions is (9.65 — 5) /29 =.16. If the serial numbers 
obtained are a random sample from a population of uniformly dis- 
tributed serial numbers, then there is more than a 1—.68280 =.3172 
probability of obtaining a maximum absolute difference of .16 or larger 
(see page 428, Table 1, N =29, in [1]). Hence the null hypothesis that 
the serial numbers obtained are a random sample from a population 
of consecutive serial numbers is accepted. 

From Section 3.2.1 we see that the unbiased estimate of the total 
number p of pieces of equipment is d 32/30=(2787 —83)32/30 
= (2704)32/30 = 86528/30 = 2884.3 for the continuous variation model 
(2883.3 for the exact model). Also, the 95% confidence interval for p is 
“2704 S p S$1.17(2704)” or “2704S p $3163.7.” 

From Section 3.2.3 we see that the chance is .9999 that the estimate 
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Fia. 5.1. Sample Cumulative Distribution of the 29 Observed Serial Numbers Between the 
Smallest and Largest. 


1.20d = 1.20(2704) =3244.8 will be within 20 per cent of p. This esti- 
mate minimizes the expected loss if no loss is incurred when the estimate 
is within 20 per cent of p and a constant loss is incurred otherwise. The 


probability is .95 that the estimate 1.21d=1.21(2704) =3271.8 will be 
within 20 per cent of p. In fact the probability is appropriately .95 
chance that the entire interval d to 1.21d, or 2704 to 3271.8 will lie 
within 20 per cent of p. 

It was a relatively simple task to obtain the serial numbers of 31 
pieces of equipment and then to estimate p in the manner described 
herein. Determining the true value of p (the total number of pieces of 
equipment) was much more time consuming. These pieces of equipment 
had been purchased in the period between 1928 and 1934 and no records 
were immediately available to determine the total purchase. We are 
indebted to Mrs. Ruth Denney, Aministrative Assistant to the Dean 
of the Social Sciences. After several days and many inquiries, Mrs. 
Denny was able to locate the records and found that the total number 
p of pieces of equipment was 2886. 
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THE PROBLEM OF AUTOCORRELATION IN 
REGRESSION ANALYSIS* 


R. L. ANDERSON 
North Carolina State College 


1, INTRODUCTION 


[' LEAST squares analysis, the usual regression model is 


¥: = Bot 2, BiXie + €, t= 1,2,---,n, 
t=1 

where the predictors, the X’s, are assumed fixed in repeated sampling 
and the e’s independently distributed with the same variance, o?. The 
X’s may be merely dummy variates (0 or 1), as in classification data 
(often called analysis of variance data). When tests of significance or 
confidence limits for the parameters are used, one usually assumes nor- 
mality of the e’s. Even if the X’s and Y follow a multivariate normal 
distribution, the least squares point and interval estimates of the ’s 
can be used, and the usual null tests applied. 

If the nY’s are successive observations in time, the experimenter 
frequently wishes to investigate the nature of the response curve over 
time. In this case he might set X;,=t', or he might use the method of 
harmonic analysis to search for periodicities in Y. In other cases, the 
assumed model might involve lagged values of Y as predictors. For 
example, 


Y, = Bo + Zz BiY ts + &. (1) 
i=] 
This is an autoregressive model. Finally one could use a combined regres- 
sion model with lagged Y’s, present X’s, lagged X’s, and time as pre- 
dictors. The method of least squares is applicable for autoregressive 
models, provided n is large [see Mann and Wald [6]]. 

One of the major difficulties with the use of least squares methods 
with time series is the strong possibility that the e’s are not independ- 
ent. Aitken [1] pointed out that it is correlation of the e’s and not of 
the Y’s which is to be avoided. It is possible that if the X’s and Y’s 
are both correlated in time, the errors will be relatively uncorrelated. 
A considerable amount of research has been devoted to the problem 





* Paper presented at Annual Meeting of American Statistical Association, Chicago, December 27, 
1952. 


113 








114 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1954 


of testing for the existence of correlation in the errors, but all too little 
on the more important problem of the best estimation procedure when 
correlations do exist. Summaries of current methods of analyzing time 
series are given by Kendall [5] and Tintner [7]. 
The correlation of successive items in a time series was called g 
Magged serial correlation by Yule [9]. At the present time, it is more popu- 
lar to use the term serial correlation to apply to the correlation be- 
— two series and the word autocorrelation for this correlation be- 
ween successive items in a given séries {see, for example, Tintner 
[7]]. I shall use this distinction. Many of the earlier papers on this 
subject, however, use the Yule terminology, as can be noted from the 
Bibliography. If we have a set of equally spaced values, Z,, Zs, - - -, 
Zn, selected from a population with zero mean, the autocorrelation 
coefficient of lag L is 

_ SZ Zinn 
VSZ28Zin12 

where 7 goes from 1 ton—L.' Most writers have preferred to use a defi- 

nition in which the denominator is simply 





TL 


(2) 


S Z,?. (3) 
t=] 

A symposium on autocorrelated time series analysis was held in 1946 
under the auspices of the Royal Statistical Society. M. 8S. Bartlett [2] 
presented a general paper and Foster [4] and Cunningham and Hynd 
[3] presented papers on the use of autocorrelation methods in non- 
economic fields. J. W. Tukey at the 1951 Annual Meeting of the Ameri- 
can Statistical Association proposed the use of the autocovariance 
(the numerator of rz) in his method of spectrum analysis of time 
series. 


2. TESTS OF SIGNIFICANCE FOR AUTOCORRELATION 


Yule [10] showed that the distribution of the correlation between 
two autocorrelated series tends to be U-shaped with a majority of the 
correlations near +1. Bartlett [15] said that if the errors were autocor- 
related, we could use the usual tests of significance of regression co- 
efficients on a preliminary basis. If these coefficients were non-signifi- 
cant, accept the result; if they were significant, a test was needed which 
took account of the autocorrelation. 





1 S will be used to indicate summation over sample values. 
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One of the common methods of analyzing a single time series is har- 
monic analysis, in which the X’s are sine and cosine terms. Fisher 
[18] presented a test of significance of the various amplitudes (the 
6’s), in the restricted case of independent errors. Wilson [41] suggested 
that one compute successive lagged autocorrelation coefficients until 
the first non-significant one is reached; then use this lag (Z) as an indi- 
cation of the proportion of independent observations (1/Z). 

Three possible models are used to explain stationary trend-free time 
series data. Wold [8] indicated that the choice depended upon the 
relationship of successive true autocorrelation coefficients, pz. These 
are usually displayed in a correlogram, as shown in the figure below. 

















> Cag L 
(i) Repeated non-damped cycles: use harmonic analysis. 
(ii) Damped correlations but with |p| >0: use linear autoregression. 
(iii) Damped correlations, with pz =0 for L>m: use the method of moving 
averages, 


Y,=e+ > YeEt—s- (4) 


s=l 


Tintner [7] also discusses these methods in detail. Bartlett [2] cautions 
about the use of empirical correlograms to determine the correct 
model because successive sample autocorrelation coefficients tend to 
be highly correlated. 

The approximate test of amplitudes in harmonic analysis and the 
decision regarding a proper model depends upon a test of significance 
for autocorrelation. For this reason the author decided to work on the 
distribution of rz in 1939. Because of the mathematical difficulties in- 
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volved, it was decided to follow up a suggestion of Hotelling to use a 
circular definition 


ry’ 


_ SZiZine 
a (5 


where ¢ goes from 1 to n, and Zn44= Z,. 

In 1941, the author studied the distribution for normal Z’s when 
the population mean was zero, and, in 1942, the distribution for Z’s 
which were deviations from the sample mean. Significance levels were 
computed for r;’, and for several cases of lags greater than one. The 
theory was simplified by the fact that 


wag > Aum; 
eS 9 
Dm; 


where the m’s are x? variables with one degree of freedom, and the \’s 
are latent roots of the characteristic equation of the matrix of the 
coefficients in the numerator. Koopmans [22] reported on the distribu- 
tion of r:, as an estimate of p in the simple autoregressive model: 


Y, = pYi-n aa €¢. (6) 


At the same time Dixon [16] was studying the moments of the distribu- 
tion of r:’ and used Beta approximations to the exact distributions to 
obtain significance levels. T. W. Anderson [14] later showed that no 
test of the hypothesis p=0 exists which is uniformly most powerful 
against alternatives of the Koopmans type. 

Sometime before this, a problem involving autocorrelation came up 
in industrial quality control, in which the mean tended to creep up and 
down slightly on successive observations. In order to study the varia- 
tion in the production process, von Neumann, et al. [35] suggested that 
the statistic 

2 n—1 
tg S (Zis1 — Z;)?/2(n — 1) (7) 

t=] 
be used to estimate o?. Williams [40] and von Neumann [33, 34] studied 
the ratio 5/s*, where s?=SZ?/n and Z=Y—Y. Young [42] tabulated 
significance levels of a linear function of this ratio by use of an Incom- 
plete Beta approximation. Hart [19, 20] tabulated probabilities by use 
of a series approximation suggested by R. H. Kent, We note that 

§/s?=2n(1—r.)/(n—1), where 
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3(Z,? + Zn?) + S ZZias 
t=1 
: ; (8) 
S Z? 


t=1 





T. W. Anderson [14] showed that r, could be used instead of r; to test 
the hypothesis p=0 for Koopmans’ model. T. W. Anderson transformed 
Hart’s significance levels [20] to significance levels of r.. 

A non-parametric test for randomness by Wald and Wolfowitz [36] 
is based on the numerator of r;’ not corrected for the mean. Wallis 
and Moore [37] developed a series of non-parametric tests based on 
the signs of differences. Further contributions were made by Rubin 
[32], Madow [26] ,Hsu [21], Leipnik [25], Lehmann [23] and Quenouille 
(29, 30, 31]. 

If we let ¢ be the error vector and o’a its covariance matrix, dependent 
upon o?, pi, p2, * * * , Pn—1, Lehmann and Stein [24] have shown that the 
best test statistic to test the hypothesis that all p;=0 is 


eae 
e'Te 


Whittle [39] used this method to test the null hypothesis that the data 
follow a first order moving average against the alternative that they 
follow an autoregressive scheme of first order, and vice versa. 

T. W. Anderson and the author [13] derived the distribution of the 
circular autocorrelation coefficient for residuals from a fitted Fourier 
series. Significance levels were found and their use indicated. Exact 
distributions were possible because of the correspondence between the 
sine and cosine variables and the )’s in the distribution of the numera- 
tor. 

Durbin and Watson [17] derived some approximate tests of auto- 
correlation of the successive residuals in least squares regression with 
fixed X’s. Let the n successive least squares residuals be Z;, Zo, - - - , 
Z,. Durbin and Watson chose a modification of the von Neumann sta- 
tistic, 


(9) 


n—1 


S (Zizi — Z;)? 
1 





S(aZ)? 
 §gz 


d= . 
S Z? 


1 
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to test for the existence of autocorrelation in the errors (e’s). We note 
5? nd 


— = ) 


st n-1 


so that that d=2(1—r.), where r, was T. W. Anderson’s statistic [14] 
to test the hypothesis p=0 for Koopman’s model. It should be empha- 
sized that the original von Neumann and T. W. Anderson statistics 
do not refer to deviations from a fitted regression; hence the Hart [20] 
and T. W. Anderson [14] significance levels cannot be used here. How- 
ever, it would appear reasonable to expect that if we have a large posi- 
tive autocorrelation, d should be near zero; and for a large negative 
autocorrelation, d should be near four. 

Unfortunately an exact distribution could not be evaluated because 
the regression variables were not latent roots of the numerator matrix. 
Hence, only upper and lower bounds of the significance levels (dy and 
dz) could be computed. This was done for 5%, 2.5%, and 1% one-tailed 
tests, for n=15 (1) 40 (5) 100 and for r=1 (1) 5. It should be noted 
that dy and d, diverge more as r increases and also as n decreases. 

In most cases, the experimenter desires a test of the null hypothesis 
against the alternative of positive correlation. Hence, one should 
expect a small value of d when the null hypothesis is false, and we 
should use the following testing procedure: If the computed value, 
d, is less than the tabulated value, d*, the null hypothesis is rejected. 
On the other hand if the alternative hypothesis is negative correlation, 
one would expect a value of d near 4 when the null hypothesis is false. 
In this case we consider d’=4—d and test d’ against d*, as above. 
Since only upper and lower bounds on the significance levels are availa- 
ble, we proceed as follows: 

(i) If d (or d’) is less than dz, reject the null hypothesis. 


(ii) If d (or d’) is greater than dy, do not reject. 
(iii) If dz <d (or d’) <dy, the test is inconclusive. 


If the experimenter wants a two-tailed test, he doubles the significance 
probability and proceeds as follows: 
(i) If d or d’ is less than dz, reject. 


(ii) If dy <d<4—dy, do not reject. 
(iii) Inconclusive otherwise. 


An approximate procedure is available for large values of (n—r—1), 
say greater than 40. In this case, (1/4)d was transformed to a Beta 
distribution, as Dixon [16] did for r, with parameters p and g, where 
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“i E(d) [4 — E(d)| i 
o*(d) 

p=}(p+ gE). 


An approximate test statistic is F = [p(4—d) ]/qd with n.=2q and nm 
=2p degrees of freedom. Or one can use Incomplete Beta tables. Dur- 
bin and Watson also present another approximation. Formulas for E(d) 
and o?(d) are presented in the 1951 article. Unfortunately, exact 
significance levels are really needed for small values of (n—r—1), 
when d, and dy tend to be wide apart. 

Durbin and Watson [17, 1951] also present methods of testing for 
autocorrelation with one- and two-way classification data and for cur- 
vilinear regression with equally spaced X’s. 

An example is presented for each of the three types of regression 
models. Short-cut methods of computing S(AZ)* are presented for each 
case. Of course, SZ? is simply the error sum of squares. For example, 
with multiple or curvilinear regression, where Y—Y is estimated by 


> b(Xi -— Xi), AZ=AY — > dAX, 


1 





pt+q 


Hence, 
S(AZ)? = S(AY)? + >> D> bibjS(AX; — AX,) — 20 bS(AX,AY). 
_— ‘ 


Special formulas can be used for curvilinear regression, because of the 
orthogonal polynomials used in computing the regression coefficients. 

Moran [28] presents an exact test of autocorrelation of the residuals, 
Z;, when only one predictor is used. He uses the circular autocorrelation 
coefficient, 


SZZ iss 
SZ? 


where Zn41=Z,, and gives formulas for F(R) and o7(R;). 


3. ESTIMATING REGRESSION COEFFICIENTS WHEN THE ERRORS 
ARE AUTOCORRELATED : 


To date most of the successful research on autocorrelation has been 
devoted to the problem of testing for its existence. All too little is 
known of what to do if the errors actually are autocorrelated. Aitken 
[1] first showed that if one knew the population covariance matrix 
for the ¢’s, he could transform the regression model so that the method 
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of least squares would give efficient estimates of the §’s. If the covari- 
ance matrix of the e’s is ac’ and the regression model (in matrix form) 
is Y=X8-+e, we premultiply this regression model by the non-singular 
matrix H, where 

HaH’' =] 


and J is the n Xn identity matrix. But even if a were known, the solu- 
tion for H might be very difficult. However, if the e’s follow a first order 
autoregressive process with autocorrelation p and variance o?/(1—°), 
the transformation is quite simple: 


a* = Yl —- p? €1, e* = €, — Péi-1, for 1<isn. 


The transformations for higher order autoregressive processes and for 
moving average processes are more complicated. A good explanation 
of this is given by Watson [54]. 

Allowing for the difficulty of making the transformation if a is known, 
the major defect is the lack of knowledge regarding the true value of 
a. Most time series are too short to enable one to derive good estimates 
of the parameters in a, or even to determine the type of process which 
is operating. A recent attempt to bypass the transformation problem 
when the e’s follow an autoregressive process was made by Champer- 
nowne [44]. He assumes that the model for the e’s is 


Zz Vs(€t—s ni a) = 6, 


s=0 


where the 6’s are assumed normally and independently distributed with 
zero mean and variance o?. Champernowne presents the following 
results: 


(i) Assuming the 7’s are known, @ was determined as a weighted mean and 
@* as a weighted quadratic function of observed values of the e’s. 

(ii) Assuming the y’s are known, estimates of and confidence limits for the 

regression coefficients are derived, both with a known and a unknown. 
The results in (ii) are derived for a =0. 
If the y’s are not known, the least-sqares estimates of the regression co- 
efficients are not linear functions of the observed Y’s and X’s; hence, the 
usual x? distribution theory does not hold exactly. A method involving 
the application of Bayes’ Theorem was used in this case. 

(v) A brief discussion is given of these problems when the X’s also have 
disturbances. 


Cochrane and Orcutt [46] indicate three principal reasons that the 
e’s in economic time series models tend to be positively autocorrelated: 





AUTOCORRELATION IN REGRESSION ANALYSIS 


(i) Faulty choice of the form of the regression model. 
(ii) Omission of important variables from the model. 
(iii) Use of incorrect variables or poor data. 


They analyzed the sample residuals for a number of econometric 
studies, and found many significant autocorrelations, using von Neu- 
mann’s statistic, 5*/s*. As indicated earlier, this statistic does not take 
account of the added correlation of the estimated residuals resulting 
from the necessity of estimating the regression coefficients; this defect 
becomes worse as the number of X’s increase. In addition &/s? does 
not take account of the autoregressive nature of many X-variables. 
Cochrane and Orcutt also conducted some empirical sampling ex- 
periments to indicate the effects of autoregressive error processes on 
least squares regression analysis, with the following indicated results: 
(i) The sample residuals tend to be biased towards randomness. 
(ii) The variance of least squares estimates of the regression coefficients are 
very large if the errors are highly autocorrelated (in their example, p 2.8). 
(iii) If the autocorrelations could be reduced to p<.3 or perhaps even p <.5, 
by use of a simple transformation, these variances appear to be close to 
those with random errors. 
(iv) The removal of trend seems to be a crude but effective transformation 
In many cases, 
(v) If sample residuals are used to estimate the error variance, o*, this esti- 


mate will be too small if the errors are positively correlated. This result 
can be proven exactly, see for example, Cochran [45]. 


Cochrane and Orcutt state that for many economic variables, it is a 
simple and practical procedure to analyze the first differences of the 
various series. If the original regression equation is 


Y: = Bo t+ > BX + €, 


t=] 


the transformed equation will be 
AY, = 7. BAX iz + Ae, 


where AZ;=Z;_1—Z;. This would be the exact transformation for p=1 
in a first order autoregressive model, except that p must be less than 1 
in order to avoid an explosive situation. However, the transformation 
should be reasonably good if p is near 1, and it is certainly very simple. 
If the sample residuals, after transforming the variables in this manner, 
are still highly autocorrelated, one might use the estimated autocor- 
relation coefficients to try a new transformation. 
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Stone [52] used the method of first differences advocated by Cochrane 
and Orcutt [46] to reanalyze his market demand data [51]. Stone uses 
the von Neumann statistic with the sample residuals to test for auto- 
correlation in the errors. He found the average autocorrelation for 
13 analyses highly significant before transforming and almost equal 
to its expectation after transforming. It was interesting to note that 
the two sets of regression coefficients were not materially different. 

Watson [54] has investigated the efficiencies and estimated variances 
of least-squares estimates of regression coefficients for fixed X’s and 
tests of hypotheses concerning them, when an incorrect transforming 
model is used. General solutions of the following type are presented: 
bounds on the bias of the estimated variance, lower bound to the 
efficiency of the estimates of regression coefficients and some bounds 
on the significance points of the t- and F-tests. He then discusses the 
following special types of incorrect transformations: 

(i) Assumed and true error processes are both autoregressive. 

(a) Both are first order but an incorrect p, is used. The greatest bias to the 
estimated variance is a downward bias when p is underestimated. 
This offers some justification for the use of the first difference trans- 
formation, which overestimates p. p is generally underestimated from 
sample residuais. However, we note low efficiencies of estimates of re- 
gression coefficients when p is overestimated unless p is nearly 1. 

(b) True process is second order and assumed process is first order. Results 
depend on how accurately one knows p; and on the magnitude of p:. 

(ii) Assumed and true error processes are both moving averages. 

(a) Both are first order with incorrect p, used. Results in (i) are reversed. 

(b) True process is second order and assumed process is first order. Indica- 
tions are that an incorrect order is more serious for a moving average 
than for an autoregressive process. 

(iii) Assumed process is first order autoregressive and true process is first order 

moving average. Even when p; is estimated correctly the bias in the vari- 
ance can be appreciable and the efficiency quite low. 


In all cases the true probabilities for 5% significance levels may be 
considerably different, the bounds being of the order of less than 1% 
to over 10% in many cases of what would appear to be only mildly 
inaccurate estimates. 

Watson is rather pessimistic regarding the use of transforming de- 
vices to remove the effect of autocorrelation in least square analysis 
of time series data. However, he believes that more investigations need 
to be made of correlograms of residuals to see if a good analysis can be 
constructed on the basis of these correlograms. Quenouille [48] pre- 
sents a test of the hypothesis that a sample was drawn from an auto- 
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regressive scheme of specified order and Wold [55] did the same for 
a moving average process. Similar tests are given by Bartlett and 
Diananda [43] and Walker [53]. However more efficient methods are 
needed, and especially we need to determine the proper process and 
order. After all, as We .n [54] remarks, one must use some kind of an 
analysis, and it is the duty of the statistician to find a good method, 
even if it is not the correct one. 

A series of articles sponsored by the Indian Statistical Institute [47] 
describe the results of using empirical sampling methods to evaluate 
the usefulness of the Wold [55] and Quenouille [48] large sample tests 
for short series. Matthai and Kannan considered three different moving 
average models and S. R. Rao and Som two autoregressive models. 
Series of length 15 and 35 were used. It was shown that both large 
sample tests gave far too many significant results for the short series 
used. Quenouille’s test showed that a second order autoregressive 
model would not fit third order moving average data; however, Wold’s 
test indicated that a third order moving average model could be used 
even if the data were second order autoregressive. This may indicate 
that a moving average model of high order is more likely to represent 
a given set of data than is an autoregressive model. Or it may indicate 
that Quenouille’s test is more powerful than Wold’s in indicating the 
correct process. It was interesting to note that in both studies the cor- 
relogram was well estimated, if one knew the correct process. The third 
paper, by C. R. Rao, presents a sequential procedure for determining 
the number of sample autocorrelation coefficients needed to estimate 
the correlogram. Rao advocates the use of likelihood to discriminate 
between several possible models to represent a given set of data. 

Sastry [49] used the above models and data to investigate the small 
sample bias in the estimates of the autocorrelation coefficients. He 
first compared definitions (2) and (3) and concluded that (2) was 
superior. However (3) is better for small lags and is certainly much 
easier to compute. In general small sample estimates have large biases, 
even for series of 100. The size of the bias depends on the type of model 
(it was much less for a second order autoregressive model than for the 
other four models) and on the values of the parameters in the model. 

Sastry [49] also considers some theoretical results for comparing 
two series of autocorrelated variates, x and y. He presents the ex- 
pected values of the means, variances, variances of the means, and co- 
variances of x and y, and some higher moments for normal variates. 
He proposes this new statistic to test the hypothesis that E(x) =p: 
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degrees of freedom. Sastry does not indicate how useful ¢’ will be when 
px must be estimated from the data. One can surely see that relatively 
unbiased estimates need to be obtained. And, most important for re- 
gression analysis, he presents the expected values and variances of 
estimates of the parameters in E(y) =a+6z. 


4, FURTHER COMMENTS ON AUTOREGRESSIVE MODELS 


Although the main topic of this paper is a discussion of regression 
analysis with fixed X’s, some references on the use of autoregressive 
models will be included. These models were first discussed by Yule 
and have been used extensively by economists. The regression coeffi- 
cients in these models are functions of the autocorrelation coefficients. 
Hurwicz [58] shows that least squares estimates of the parameters 
are biased in small samples. As indicated previously, Mann and Wald 
[6] showed that this bias approached zero as the sample size increased. 
Only large sample least-squares variances and covariances of the esti- 
mates of the parameters are available; hence, confidence limits for the 
parameters and predicted values are available only for large samples. 
Tintner [7] presents an example for a third-order process. Kendall [5, 
59] gives further information on the use of least squares to estimate the 
parameters. 

Bartlett [2] presents a method of estimation based on the concept of 
a continuous rather than a discrete process. Ghurye [57] has developed 
a method of using more of the autocorrelation coefficients in estimating 
the parameters. He introduces a superposed variation for each opera- 
tion, so that the model is (assuming fy =0): 


(Y, + nt) = he Bi Y ts + ni) + &. 


An extensive study of autoregressive analysis is presented by Orcutt 
[60]. 
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Das [56] uses empirical sampling methods to measure the goodness 
of least squares methods for three equation economic models, in which 
one equation is quantity in terms of present prices and the others 
involve only lagged variables as predictors. 


5. OMITTED TOPICS 


The following topics, of importance in the analysis of time series, 

have not been discussed in detail. 

(i) The estimation of parameters in a multi-equation system. For a discussion 
of this procedure, see for example Koopmans [62] and Klein [61]. An 
article by Orcutt and Cochrane [64] presents an empirical sampling study 
of the adverse effect of autocorrelation on the estimates of structural 
parameters in a multi-equation model. They concluded that, “Unless it 
it possible to specify something about the intercorrelations of the error 
terms in a set of relations and to choose approximately the correct auto- 
regressive transformation, a certain amount of skepticism is justified con- 
cerning the possibility of estimating structural parameters from aggrega- 
tive time series of only twenty observations.” 

(ii) Comparing two time series. See Bartlett (15, 2], Orcutt and James [65] and 
Moran [63]. 


6. SUMMARY 


Much research has been devoted to the distributions of various sta- 
tistics used to test for the existence of autocorrelation of successive 
observations. Others have studied the problem of estimating parameters 
in various stochastic processes, such as autoregressive and moving 
average processes. A summary of this research is given in this paper. 

Only recently has research been extended to the problem of testing 
for the existence of autocorrelated errors in regression models, such as 


Y= Bot DBXute, t=1,2,---,2, 
i=l 
where the X’s are fixed predictors and the e’s are normally distributed 
with equal variance. Durbin and Watson [17] present upper and lower 
bounds on the significance levels for making such tests. Moran [28] 
presents an exact test for r=1. 

Too little information is available on the proper methods of estimat- 
ing the #’s when the e’s are autocorrelated. Aitken [1] indicated the 
exact method of transforming the regression variables when the 
autocorrelations were known. Champernowne [44] added to this 
general theory and presented a Bayesian method when the autocor- 
relations were not known. 
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Cochrane and Orcutt [46] used empirical sampling methods to indi- 
cate the effects of autocorrelated errors on the estimates of error and 
the 6’s. They showed that, in many cases, first differences of the Y’s 
and X’s would have a relatively uncorrelated error process. A series of 
articles in Sankhya [47] have also used empirical sampling to indicate 
the large biases in testing and estimation procedures with small sam- 
ples. 

Watson [54] has shown the seriousness of using the wrong type of 
error process and incorrect estimates of the autocorrelations in trans- 
forming the regression variables. He concludes that the most fruitful 
research seems to be in utilizing more efficiently the estimates of the 
autocorrelations. 
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JOINT CONFIDENCE REGIONS FOR MULTIPLE 
REGRESSION COEFFICIENTS 


Davin DurRaNnD 
National Bureau of Economic Research 


ORE and more statisticians are coming to realize that conventional 
M confidence intervals are not strictly applicable to problems re- 
quiring the estimation of several parameters. In multiple regression a 
conventional interval may be correctly determined for one, and usually 
only one, of the regression coefficients. Ordinarily, however, the statis- 
tician wants a measure of accuracy for each of his coefficients, but if 
he obtains these in the form of conventional confidence intervals, he 
usually commits a fallacy. Here we discuss the nature of this fallacy 
and a possible remedy through the use of a joint confidence region. 


1. MULTIPLE CONFIDENCE STATEMENTS 


In classical multiple regression it is assumed that the dependent 
variate Y is normally distributed with constant variance about a linear 
function 


(1.1) bo + bX. + b2X2 + +++ + deXe 


in which the coefficients b; as well as the variance a? are unknown param- 
eters. From a set of n>k-+1 error free and linearly independent ob- 
servations on the X,’s and n corresponding values for Y, one obtains 
the estimates ; and ¢*—here understood to be maximum likelihood 
estimates. Then, the theory of confidence provides criteria for judging 
the accuracy of these estimates. 

After deriving the k-variable regression function (1.1) many statisti- 
cians would want exactly k+2 confidence statements—one for each of 
the k+1 6,s and one for o*. But this rule is far from general. Personal 
preference or the requirements of the problem may dictate 1, 2, - - - 
k+2, or even more statements. Since the theory of confidence permits 
statements for linear combinations of the type 


Cobo + cibi + cabo + --> + crbi, 


there is literally no limit to the number of confidence statements that 
can be considered. 

The conventional form of confidence interval for a partial regression 
coefficient is determined by the relation 
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/a?ne?/(n — k — 1) & be 





(1.2) 





where t, is the upper $a point of ‘Student’s’ ratio for n—k—1 degrees 
of freedom, and a? is the element corresponding to ay, in the inverse of 
the matrix ||a;;|| =|| 5o%..(Xiae—Xie)(X~—X4)||. Given a value for a, 
say .05, (1.2) determines an interval that will cover the true parameter 
value b, with probability 1 — a.’ But although these conventional inter- 
vals are entirely valid when properly applied and interpreted, there 
are two basic fallacies or improprieties that frequently arise in practical 
work. The first consists in deciding after the experiment what confi- 
dence statements to make. The second consists in making several indi- 
vidual statements at level 1 —a in a way that implies a joint statement 
at the same level. 

Concerning the first fallacy, it is common practice in statistical 
studies in general to go over the data with a fine-toothed comb, to apply 
a battery of significance tests, and then to select a relatively few con- 
clusions that seem particularly noteworthy. In regression studies it is 
common to experiment with several equations before selecting one for 
presentation. This might consist in calculating a regression equation 
with five variables, discovering that two of the coefficients do not differ 
significantly from zero, and then recomputing the equation with three 
variables. But procedures such as these may introduce bias, as may be 
seen from an extreme example. Suppose that the true regression coeffi- 
cients were all zero throughout a series of experiments, and suppose 
that the experimenter made a practice of presenting only regression 
equations with coefficients significantly different from zero at the level 
a. Then, if he made confidence statements for these coefficients, his 
probability of being right would not be 1—a at all, but zero; for he 
would make a statement only when the parameter value zero lay out- 
side the confidence interval. 

The second common fallacy in using conventional confidence inter- 
vals is the implied joint statement. In a two-variable problem—such 
as that presented in Section 3—we might make the following three con- 
ventional statements at the 95 per cent level: 





1 Perhaps it is necessary, for the record, to say a word about the meaning of “probability” in this 
context. Once the experiment has been performed, the interval either does or does not cover the point by, 
and the probability is therefore 1 or 0. Before the experiment, however, we may argue that the outcome 
is uncertain and that the probability is properly 1 —a. And before performing a series of experiments, 
we may argue that 1 —a is the proportion of projected confidence statements that will be correct in the 
long run. 
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(1.3) 52s hb 3s .98 
91 Sb, + be S 99. 


The first statement locates the parameter point (1, b2) within an infinite 
band bounded by two lines parallel to the 6. axis in two-dimensional 
parameter space (Figure 2) ; the second locates the same point in a band 
between lines parallel to the b; axis; and the third locates the point 
between two parallel sloping lines. But the three statements, taken all 
together, locate the point within the hexagon formed by the intersec- 
tion of the three bands. So, if the confidence level of the individual 
statements is 1—a=.95, the level of the joint statement is demon- 
strably less. 

The actual confidence level of a set of statements like (1.3) can be 
readily obtained for one or two special cases, the most obvious of which 
occurs when, in a k-variable problem, the cross-products a;;(i+j) 
= )°2..(Xie—Xi)(X~—X,) are all zero and the variance o* is known. 
Then, the sampling distributions of the individual 6;s are all inde- 
pendent, and the joint confidence level for any subset containing 
exactly m of these coefficients is therefore (1 — a)”. Another special case, 
involving differences between means in the analysis of variance, has 
been discussed by Tukey [11]. But in general the joint confidence level 
of statements like (1.3) is not readily obtained, though a lower bound 
can be obtained as shown in Section 5. 

As an alternative to calculating the confidence level of joint state- 
ments like (1.3), Scheffé [9], Roy and Bose [8], and the author propose 
to define an infinite set of intervals whose totality is equivalent to a 
joint confidence ellipse at the level 1—a. Although confidence ellipses 
have been understood in theory for some time, they have received little 
practical application—notwithstanding an important econometric ex- 
ample by Haavelmo [4]. Possibly, the unpopularity of the ellipse is 
due partly to the difficulty of representing it graphically—except in two 
dimensional examples, like Haavelmo’s—but this is largely obviated 
by the proposed technique of substituting an infinite set of intervals. 

The use of the ellipse with an infinite set of intervals has two distinct 
advantages. First, the calculations required are no more difficult than 
those required in the conventional approach. Second, if a finite or 
infinite subset of the intervals is chosen in any way whatsoever, before 
or after the experiment, the confidence level cannot be less than 1 —a; 
thus the fallacy of choosing statements after the experiment is avoided. 
At the same time, this approach has one drawback; whenever interest is 
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limited to a finite subset of intervals that can be specified in advance, 
the confidence level will actually exceed 1—a, and the intervals will 
therefore be larger than necessary. For example, Tukey’s method for 
contrasting means in the analysis of variance allows for just $k(k—1) 
specified contrasts among k means; and for k>2, as shown by Scheffé 
[9], Tukey’s intervals are smaller than those derived from the joint el- 
lipse, and the difference increases as k increases. However, it should 
be remembered that the joint ellipse is proposed primarily for investi- 
gations where the specification of questions in advance is not con- 
venient. The next section presents an example characterized by a pro- 
liferation of possible questions and great need for flexibility, and it is 
for problems of this sort that the joint ellipse, with its infinite set of 
intervals, is ideally suited. 


2. AN EXAMPLE 


In an exploratory cross-section study of seventeen New York bank 
stocks for February 1951, the dependent variate, log P, had an esti- 
mated variance é?= .0006863 about the estimated regression plane 


(2.1) log P = .037 + .65 log C — .95 log S + .26 log D, 


TABLE 1 


MEANS AND SUMS OF SQUARES AND PRODUCTS ABOUT THE 
MEANS FOR REGRESSION ANALYSIS OF 17 NEW 
YORK CITY BANK STOCKS, FEBRUARY 1951 


Note: The variables in this example were all expressed as common logarithms with 
2-place mantissas. The sums of squares and products were rounded to 5 decimals 
for use in calculations. 








X3 Y 
Log Dividends 
(total dis- 
bursements 


it: 
$10,000) 10,000) atom $1.00) 


X X2 
Log Capital Log Shares 
(year end (February 
1950, unit: 1951, unit: 


Log Price 
(February 
1951, unit: 





Means 3.9288 1.9353 4.5206 1.9512 


Sums of Squares and 
Products about the 
Means 
3.25538 3.51281 — .30838 
7.25942 —3.65961 
— .16441 
3.23758 
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which then lead to the exponential form 
(2.2) P = 1.09C *S--*D-*. 


In these equations, P is market price (end of February 1951), C is total 
capital funds (end of 1950 in units of $10,000), S is total number of 
shares outstanding (end of February 1951 in units of 10,000), and D 
is total dividend disbursements (during 1950 in units of $100). The 
means and the a;; matrix for the logarithmic variables are given in 
Table 1. The forward Doolittle solution, which will be needed in the 
subsequent discussion, is given in Table 2. 


TABLE 2 


FORWARD DOOLITTLE SOLUTION FOR REGRESSION 
COEFFICIENTS IN EQUATION (2.1) 


Note: Although entries have been rounded to 4 decimals for illustration here, 
the original matrix (see Note, Table 1) was obtained to 5 decimals, and subse- 
quent calculations were carried to 8 or 9 decimals. 








Xi X: X; 
Log Log Log 
Capital Shares Dividends 
($10,000) (10,000) ($100) 


Y 
Log Check Sum 
Price 





3.2654 3.1528 3.4260 .3084 10.5026 
—1.0000 —1.0791 —1.0524 — .0947 —3.2262 





3.5128 7.2594 3.5538 3.6596 17.9856 
—3.5128 —3.7906 —3.6969 — .3328 — 11.3331 
3.4688 — .14382 3.3268 6.6525 
—1.0000 .0413 — .9591 —1.9178 





3.5538 -6985 - 1644 10 .8427 
—3.6969 .6056 .3245 —11.0531 
. 1432 -0059 1373 . 2746 
.0870 .0228 .0642 

-0000 . 2622 — .7378 





Though the form of equations (2.1) and (2.2) is convenient for com- 
putation and meets the needs of an estimating equation, it does not 
adequately describe the structure of the bank stock market. For this, 
oae wants to relate stock prices to such variables as book value and 
dividends per share. However, a simple linear transformation on the 
logarithms of the independent variables in (2.1) and (2.2)—namely, 
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log C = log C 
log C/S = log C — log S 
log D/S = log D — log S 
produces & new equation 
(2.3) P = 1,09C~-*(C/S) **(D/S)-*, 


where the transformed independent variables are total capital (which 
indicates size of bank), capital per share (book value), and dividends 
per share. In this transformed equation the new coefficients (or ex- 
ponents) are all linear combinations of the original coefficients, thus: 


— .04 = .65 + .26 — .95 
.69 = .95 — .26. 


The variance, é? = .0006863, is unaffected. 

At the time this regression analysis was performed, most bank stocks 
were selling at substantial discounts from book value—a fact that 
worried many financiers. Again, suitable equations for studying dis- 
counts can be derived from (2.2) by other linear transformations on 
the logarithmic variables. One possible equation is 


(2.4) P/B = 1.09B--°(D/C):*%S--™ 


where B=C/S represents book value. Here, as in (2.3), the new coeffi- 
cients are all linear combinations of the coefficients of (2.1), and the 
variance remains unchanged 

Thus, by the nature of this problem, it is possible to start with a basic 
equation, (2.1) or (2.2), and to derive from this by suitable trans- 
formations a series of special purpose equations. This process, however, 
multiplies the number of regression coefficients and combinations for 
which confidence statements are required. In addition to the constant 
term, which is not affected by the illustrated transformations, equa- 
tions (2.2), (2.3), and (2.4) contain five different regression coefficients, 
and if all the ramifications of the problem were to be explored, more 
equations and coefficients would undoubtedly arise. Moreover, it 
should be realized that this example was artificially cut down for 
simplicity in presentation. A systematic study of bank stock prices 
should contain anywhere from five to ten basic variables and upwards 
of twenty-five transformed variables. 

To avoid the fallacies of multiple statements in this‘problem, where 
so many statements are possible, a joint confidence ellipse will be 
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determined for bi, be, and bs. Although the constant term bo could be in- 
cluded in the ellipse (see Section 6), this particular problem is prima- 
rily concerned with the structure of the market as indicated by hi, by, 
and b;—not with the general level of the market as reflected by bo. How- 
ever, a statement may be desired for the variance o?, and this can be ob- 
tained in the conventional manner provided no joint relationship is 
thus implied for o? and the b,’s. 


3. THE JOINT CONFIDENCE ELLIPSOID 


A joint confidence region for the k regression coefficients obtained 
when a single dependent variate is regressed upon k independent vari- 
ables Xi, Xo, - - - , X: is given by the ellipsoid 


k k 
(n—k— NZ, ) » a,; (b, — b,)(b; — b,) 
(3.1) Fa(k,n —k—1) = Me , 
kné? 





where F,(k, n—k—1) is the upper a point of the F-distribution for k 
and n—k—1 degrees of freedom, n is the number of observations, 
aij= > f-1(Xiae—X)(X~—X), 5; is the maximum likelihood estimate 
of the true regression coefficient b;, and 6? is the maximum likelihood 
estimate of the variance.? To apply (3.1) a value of a is chosen, say 


.05, and the single statement is made with probability 1—a=.95 that 
the parameter point b;, be, - - - , by lies within the ellipsoid. 

To illustrate a joint confidence region graphically, the prices of the 
seventeen New York City bank stocks were regressed on the two vari- 
ables dividends per share, D/S, and book value, C’/S; and the following 
resulted, with variables now expressed in dollars: 


(3.2) P = 2.15(D/S)-*°(C/S) *. 


A 95 per cent ellipse was then determined by inserting in (3.1) the 
appropiate numerical values, including Fo(2, 14) =3.739 and é? 
= .0008806. A graph is shown in Figure 1. Strictly speaking, this con- 
fidence region applies only to points in two-dimensional parameter 
space—that is, to paired values of b; and be. Thus the combination 
6; =.30 and 6: =.65 lies within the ellipse and is admissible, whereas the 
combination b,=.15 and b.=.70 lies outside and is inadmissible. It 
will be noticed immediately, however, that for certain values of b;— 





2 For derivation of (3.1) see Wilks [12, Sec. 8.3]. In following the derivation, one must remember 
that Wilks makes one of the Xjs, say X:, arbitrarily equal to unity; hence b: in his notation is the same 
an by herein, and & in his notation is the same as +1 herein. 
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namely, those less than —.09 or greater than .49—all points lie outside 
the ellipse. Likewise, for 2 less than .45 or greater than 1.05, all points 
lie outside. Thus, we obtain the intervals 


-—- 09s .49 
45 S by S 1.05, 


(3.3) 


each of which corresponds to a confidence level exceeding 1—a. Al- 
though intervals of this type have been called confidence intervals 





x 





= pe ee 
x) x) 7.0 1.05 1.4 





Fig. 1. 95 per cent joint confidence ellipse and subsidiary 
intervals for (3.2). 


they are here referred to as “subsidiary intervals” (subsidiary to the 
ellipse) in order to distinguish them from the conventional intervals. 

Similar subsidiary intervals can be derived, all from the same ellipse, 
for linear combinations of the regression coefficients. In regressions em- 
ploying the exponential ByU,"U," - - - U,** the degreeof this homogene- 
ous function is often of interest, and this is indicated by the sum 
bitbe+ - - + +b,. In production studies this sum indicates the degree 
of returns to scale ;? in the current example, (3.2), it indicates whether a 





§ In production studies it is usually desirable to assume that the independent variables are subject to 
error, The treatment of this problem and a substantial bibliography are given by Tintner [10]. 
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stock split will result in a proportional reduction in the price of the 
stock. The subsidiary interval for the sum b,+, in (3.2), 


(3.4) 90 < bi + by < 1.00, 


can be established graphically by drawing lines of the family y=b,+, 
in Figure 1 and regarding as inadmissible all those that do not meet the 
ellipse at any point. 

The totality of all possible subsidiary intervals is, in a sense, equiva- 
lent to the joint ellipse. The combined intervals (3.3) restrict the param- 
eter point (b:, b:) to the rectangle BGEH of Figure 1, which includes all 
of the ellipse. The combination of (3.3) and (3.4) further restricts this 
point to the hexagon ABCDEF, which again includes the ellipse. If 
this process is repeated by combining still more intervals—say b.—b, 
or b,—3b,—the resulting many-sided figure can be made to approach 
the ellipse as closely as desired. Hence the totality of all possible sub- 
sidiary intervals has a joint confidence level of 1—a. Accordingly, if 
one performs a series of experiments, he may expect that in at least 
100(1—a) per cent of them, all of the subsidiary intervals, however 
chosen, will cover the true parameter point. 


4, SUBSIDIARY INTERVALS IN k DIMENSIONS 


In a two dimensional problem like the preceding, subsidiary limits 
can be derived graphically with fair accuracy. But when greater accu- 
racy is desired, or when more than two dimensions are involved, an 
analytic procedure is indicated. The problem in general is to set limits 
for linear combinations 


k 
(4.1) Q — p hid., 


where the constants h; are given. If arbitrary values are assigned to Q, 
equation (4.1) defines a family of hyperplanes in k-dimensional param- 
eter space. In particular there are two distinct quantities 0 and Q 
(O>Q) that define two planes tangent to the confidence ellipse (3.1). 
Hence all members of family (4.1) having the property Q>0 or Q<Q 
will lie completely outside the ellipse, and the subsidiary statement is, 
therefore, Q2<Q<0. 

Finding the tangent planes and the quantities 0 and Q may be facili- 
tated by transforming the confidence ellipse (3.1) into a sphere with 





JOINT CONFIDENCE REGIONS 139 


its center at the origin. For this purpose a linear transformation is re- 
quired* 


k 
(4.2) d=b;-—b = 2. C456 

j=l 
such that 


. k k k k 
(4.3) >) do aid; — bi); -— b) = DO D aidid; = > 5,2. 


inl jal i=l joel 


By means of (4.2) the confidence ellipse is transformed into the sphere 





F.(k,n — k — 1)kné? as 
(4.4) ; = > 5? 

n—-k-1 i=l 
and the tangent planes 0= >~h,b; and Q= hb, are transformed into 
new planes tangent to this sphere. For 0 (a similar relation holds for 
Q) the new plane is 


k k 
hidbi — Q = DL m45;, 
jan jul 


k 
nN; = px Ci shi. 
t=] 
A solution for 0 is now obtainable from 


O- Dhd _ fe 
Vim? n—-k—-1 Fr 


where the left hand member is the distance between the transformed 
plane and the origin, and the right hand member is the radius of the 
transformed confidence sphere (4.4). A solution for Q can be obtained 
by taking a negative value for one of the square roots in (4.5). There- 
fore, the subsidiary interval takes the form 








(4.5) 





4 Such a transformation can be found—in fact a number of different transformations can be found— 
if the matrix of the a;;'s is of rank k (Bécher [1, p. 134 ff.]. Moreover, the rank will be k if the X;’s are 
linearly independent as assumed (Wilks [12, p. 160]). 
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> ab 4/ Pat n — k — 1)kné*>, m? 
— 


n-k-—1 








(4.6) 





—-k- 1)kne?>- m; 
n—-k-1 , 





< Dhds Ss Dhd + 4/ = 8 


In the above, it can be shown (though the proof will not be given) 
that the quantity 


ne? >, m? 
n—-k—-1 


is the unbiased estimate of the variance of the linear combination 
Q= >-hb;. Then (4.6) can be written 


Q — VikF.(k, n — k — 1) Est. Var. 
<$Q5Q+4+ VkF.A(k, n — k — 1) Est. Var. 0, 


which is essentially the result derived by Scheffé [9] for contrasts in 
the analysis of variance and by Roy and Bose [8] for the general linear 
case. 

Of the many possible ways of deriving transformation (4.2), the one 
chosen for illustration here is conveniently used in conjunction with the 
Doolittle solution for the normal regression equations. In essence, the 
quadratic form on the left of (4.3) is reduced to a sum of squares by the 
method of Lagrange (see Bécher [1, p. 131]). The reduction is performed 
by the following transformation: 


- (aud, + aids + aisd3 +--+ + duds) / Van 
(Ge2’de + aoy'ds + ++ - + ex’de)// ae’ 





(4.7) 





Onde /V/axn?-Y 


in which 
, 
ais = Ay 010;;/an 


ais’ = a4;’ — Oy2'A2;’/ das’ 


Thus, all of the entries in (4.8) arise in the regular reduction process of 
the forward Doolittle solution. Finally, the required transformation 
(4.2) is obtained by inverting (4.8), which is easy. 
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As a numerical example, Table 2 shows the standard forward Doolit- 
tle solution for the regression coefficients of (2.1) and (2.2). The stand- 
ard form is commonly found in texts (see, for example, Croxton and 
Cowden [2, pp. 716-20]) and is therefore suitable for illustration. For 
practical computation, the abbreviated Doolittle is probably superior 
(see Dwyer [3, pp. 107-12]). Transformation (4.8) would be derived 
from the italicized quantities in Table 2 as follows 


6, = (3.2554d, + 3.5128d2. + 3.4260d3) /1.8043 
2 = (3.4688d, — .1432d3) /1.8625 
53 = .0870d;/.2950. 


This is easily inverted by solving for the d,’s in terms of the 6,’s, thus 


d, = .55425, — .57945, — 3.71896; 
(4.2’) dy 536952 + .14006; 
d; = 3.390353. 


Now suppose that a subsidiary statement is desired for the sum 
bi +be+bs in (2.1), whose maximum likelihood estimate is —.04. The 
same subsidiary statement is applicable, of course, to the exponent of 
C in (2.3) and of S in (2.4). By means of (4.2’) the sum is transformed 
thus 


bh +h+b—-O=-d+at+ds 


= >) m3; 


= 55425, — .042552 — .18866s. 


From the above, one calculates the quantity 
>> m? = .3445. 


This is substituted in (4.6) along with k=3, n=17, 6?=.0006863, and 
F o5(3, 13) =3.410, and the resulting interval is —.096<bi+}.+); 
S.016. 


5. THE JOINT CONFIDENCE ELLIPSE VERSUS CONVENTIONAL 
CONFIDENCE INTERVALS 


Since Scheffé [9] has discussed at some length the relation between 
the confidence ellipse and the conventional form of confidence interval 
for the analysis of variance, only a brief extension to regression prob- 
lems is needed here. In Figure 2, which again refers to (3.2), the con- 
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ventional 95 per cent limits, already given in (1.3), are shown super- 
imposed on the eliipse of Figure 1. The hexagon A’B’C’D’E’F’ formed 
by the intersection of the three conventional bands is similar and sim. 
ilarly situated to the circumscribed hexagon ABCDEF in Figure 1, 
and one may surmise that the smaller hexagon encloses an ellipse sim- 
ilar to the 95 per cent ellipse. Geometrically, then, the conventional 


naan 


lees | 
' 
‘ 
' 
' 
‘ 
’ 











Fig. 2. 95 per cent joint confidence ellipse and conventional 
95 per cent confidence intervals for (3.2). 


intervals of Figure 2 bear the same relation to their inscribed ellipse 
that the subsidiary intervals bear to the 95 per cent ellipse; that is the 
smaller elipse is equivalent to the totality of all possible conventional 
intervals and provides a lower limit for the joint confidence level of a 
set of statements like (1.3). 

To determine the confidence level of the ellipse inscribed within 
A'B'C'D’E'F’, it is convenient to rewrite (1.2) 


6, — taWEst. var. bp S bp S bp + ta Est. var. by. 
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Then, on comparison with (4.7), it is apparent that 


(5.1) te = VkFalk, n —k—1) 





or 
Fa(l,n —k—1) = kFe(k,n —k — 1), 


where 1—a’”’ is the confidence level of the conventional interval and 
1—a’ is the level of the inscribed ellipse. The problem of finding a’ 
given a’’, or vice versa, is conveniently solved by means of Pearson’s 
Tables of the Incomplete Beta-Function [7]. In the example under dis- 
cussion, k =2 and t.o,;=2.145 for fourteen degrees of freedom; then (5.1) 
indicates F (2, 14) = 2.3005. By means of the transformation described 
by Pearson [7, p. xlvii] or Mood [5, p. 206], approximately .86 for 
1—a’ is obtained from the beta-function table. Thus the probability 
that (1.3) is correct is]bounded by .86 and .95. Table 3 presents values 
of 1—a’ corresponding to the conventional .95 and .99 levels of 1—a’’ 
for several other combinations of k and n—k—1. For large samples and 
special situations where the variance is known, the limiting value of 
1—a’ for Fa (k, ©) is obtained from Pearson’s Tables of the Incomplete 
Gamma-Function [6]. 


6. INCLUSION OF bo IN THE CONFIDENCE ELLIPSE 


Iu some applications it may be desirable to extend the confidence 
ellipse so as to cover the constant term bo. Define an arbitrary inde- 
pendent variable X»=1. Then, bo is merely the partial regression coeffi- 
cient of Y on Xo. Let 


Dd XieX je =" Ag. 
f=1 


In particular, Aoo= >. X0?=n and Aoj= > XaeXe= >.AX4- Then, as 
shown by Wilks [12, Sect. 8.3] and Mood [5, Sect. 13.5], the joint con- 
fidence ellipsoid is defined by 


k k 
(n—k-D>D Dd Aix(bs — 6); — 0) 
i=Q j=0 


F.(k+1,n—k—1)= 





(k + 1)né? 


This, like (3.1) may ve transformed into a sphere by the method of 
Section 4. 
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TABLE 3 


LOWER BOUND OF PROBABILITY THAT ALL CONVENTIONAL 
CONFIDENCE INTERVALS WILL BE CORRECT FOR k 
REGRESSION COEFFICIENTS IN SAMPLES OF n 








Conven- 
tional 
Confidence 
Level 


Degrees of Number of Independent Variables k 
Freedom 
n—k—l 3 4 5 6 8 








-95 : .79 -70 62 53 .38 
.78 .67 57 47 31 
76 65 ° 43 26 
75 .62 ‘ 39 


.74 61 ° 37 19 
74 -60 ‘ 34 17 
73 -59 ‘ 32 15 
72 .57 , .30 13 


95 -92 ‘ 85 77 
.94 91 ° 81 71 
-94 .89 ‘ -78 -64 
93 .88 ‘ 74 .58 


-93 87 ‘ ‘ 54 
92 -86 ° .70 -50 
-92 85 ‘ .67 47 
.92 84 , .64 42 





CONCLUSION 


In the simon pure application of the conventional approach, where a 
single statement is specified, it is possible to establish an interval with 
a definite probability 1—a. The same is true for certain special prob- 
lems involving multiple statements—such as Tukey’s problem of con- 
trasting means. But in general, we must conclude that establishing a 
single probability for a set of multiple statements is either impracticable 
or impossible. Instead, there will be two bounds, 1—a’ and 1—a’’ 
between which lies the probability of being right. How, then, are these 
gounds to be chosen? 

The common procedure to date, of establishing the upper bound 
without regard to the lower, can hardly be justified. To take an ex- 
treme example from Table 3, what does it mean if we establish inter- 
vals for a 10-variable regression and all we can say is that the probabil- 
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ity of being right lies somewhere between .05 and .95? The converse 
procedure, of establishing the lower bound without regard to the upper, 
as recommended here, will be criticized, no doubt, as unduly conversa- 
tive. But whether this criticism is justified or not will depend on the 
number of statements contemplated and the degree of flexibility re- 
quired in choosing them. In the bank stock example, where a multiplic- 
ity of statements is indicated and the choice of statement must be 
deferred until after the experiment in order to avoid missing important 
findings, there is every reason to believe that the true probability 
1—a is much closer to its lower bound than to its upper. Here the loss 
of efficiency in using the conservative approach cannot be very great. 

There remains, of course, the perplexing problem of establishing 
bounds when a limited number of statements are contemplated but no 
convenient computation procedure is available for ascertaining 1 —a. 
Then, a possible solution is to select bounds that straddle the desired 
probability. For small values of k, say less than four, the bounds are 
not too far apart, and one might find examples for which 1 — a’ = .90 and 
1—a’’=.99 approximately. Perhaps this provides a working approxi- 
mation for the .95 level on the hope that 1—a lies about midway be- 
tween its bounds. 
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ASYMPTOTIC RELATIVE EFFICIENCIES OF DISTRIBU- 
TION-FREE TESTS OF RANDOMNESS AGAINST 
NORMAL ALTERNATIVES 


ALAN Stuart 
London School of Economics 


1, THE MEASURE OF EFFICIENCY 


EVERAL writers, notably Hotelling and Pabst [5], have explicitly 

assumed that the relative efficiency of two test statistics is to be 
measured by their estimating efficiencies. While this seems reasonable, 
it is by no means obvious, since if the two tests are consistent, the ratio 
of their powers against any fixed alternative hypothesis must tend to 
unity with increasing sample size n, and it may easily be shown that 
for any n, the less efficient estimator may provide a more powerful test 
(Sundrum [14]). 

Pitman [11] has proposed a measure of the asymptotic relative effi- 
ciency of consistent tests. Given that the two statistics, 4, and #, have 
normal limit distributions with variances of order n-', and that certain 
general regularity conditions are satisfied, he considered a limiting 
process in which the alternative hypothesis H; differs trom the null 
hypothesis H» by a quantity of order n-/?, so that as n increases, Ai 
tends to Ho. Under these conditions, he showed that the reciprocal of 
the ratio of sample sizes required to attain equal power against the 
same alternative was, in the limit, 


FS ze], V(te| @ = 6) 


one aa 


where @ is the parameter whose value distinguishes Ho from Hi, and 
E and V denote mean value and variance respectively. 

Some such limiting process is necessary if we require a single measure 
of the relative efficiency of two tests, but it is not altogether surprising 
that Pitman’s result is equivalent to the use of estimating efficiency as 
a criterion. With ¢, and & as our (consistent) test statistics, let 


Ti =f(t), Ts = g(t) 


be transformations of them which are consistent estimators of the 
underlying parameter 0. For large samples, 


147 


(1) 
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of\ 2 
V(T)) = (=) V(t). 


If T;=0{1+0(n-)} and t:=E(t,){1+0(n-*)} where 4, ¢ are positive 
constants, we may write,' to our order of approximation, 


Of of /dts as dE (ts) 


0, @ 30 06 


V(T:) = V(t) / yee = 


Similarly 


V(T:) = V(t) / ss 


V(T2) ai (Ey’)? V2 
V(T;) vi (E,’)? 





(2) 


in a shortened notation. 

At 6 the right side of (2) is equivalent to Pitman’s measure (1), and 
since the left side of (2) is the ordinary estimating efficiency of the 
transformed statistics on Ho, it follows that Pitman’s measure simply 
reproduces the estimating properties of the appropriate transforma- 
tions of the test statistics being considered. As a simple example, the 
sign test for the median reproduces the efficiency property of the 
sample median as an estimator of the population mean. (The result is 
2/x for normal populations.) 

Thus Pitman’s result on power may be regarded as a justification of 
the procedure of using estimating efficiency as a test criterion. 


2, TESTS OF RANDOMNESS AGAINST NORMAL REGRESSION 
ALTERNATIVES 


Using Pitman’s measure, we may investigate tests of randomness for 
the standardised normal regression model 


yi = a+ BX; + €;, 





1 Mr. J. Durbin has considerably simplified this derivation. 
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where ¢ is a normal vector with E(e)=0 and V(e) =1. If the values of 
X are spaced at equal intervals, no generality is lost in replacing them 
by the numbers 1 to n. 

With this model, Ho is the hypothesis that 8=0, so that 6 is the 
underlying parameter. 

The standard test in this situation is based on the sample statistic 


b= Dw DX-H/{TK- HH, 


which is exactly normally distributed with mean 6 and variance 

1/{>o(Xi-X af Since we have replaced the X; by the natural num- 

bers, >.:(X;—X)?=n(n?—1)/12 and, for the statistic b, we have 
(Ey nt 


¥ ©) 


3. PRELIMINARY RESULTS 


For application in succeeding sections, we require a few preliminary 
results for the normal regression model. 
Define 


‘ 1 if Yi Yi, 
Hass 0 if y¥:<y;, 


and 
Ai = 1. 


(a) Now E(H;;) =Prob {H;;=1}, and since (y;—y,) is a normal variate 
with mean 8(i—j) and variance 2, this is 


=f vlo - 29,2) 


* f NO, 1), 
—B (i—j)/2 


in an obvious notation. 
Thus 
- zu.) _o-n 1 eee 
op p=0 V2 V2 Qn 
(b) Also E(H;;Hi:) = E(H;;)E(Hx)). 
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So that 
© RH Hu) = Ea) {2 Bay) + {= ec) Buta) 
a8 26 a6 
and, from (4), 
|= BHR | =(; ot OG 5) 
0p peo ss 2 \ 2A 2" 


~—{64+H-G+D} 
ae 4 





(c) Since 
d oF) d¢ 
aa» b = b 5] 
aon 88 ee ee 
we have 


“ f- f flu, s)dude = 6 ee flax, v)dv + b : f(u, b)du, (6) 


and similarly for variables at the lower limits of integration. 
Now (y:—y,) and (y;—y.) are jointly normally distributed with cor- 
relation —}. Thus 


E(Hi;H x) = Prob (Ai; = 3. Ay = 1) 


co} co) 0, 1 
oe SII RE ie: 
—B(i—jy V2 SY —B(g—k)/ V8 0, 1 


in an obvious notation, so that, applying (6), appropriately modified, 





i-j is) 1 V3 5 


re] 
— E(H;;H; = — — 
F- , |. (SF ee 


t-—k 


hee 


rJ/3 2 2 


Similarly, we see that 


a’ i +j!—f2k 
— E H; H ie: wee 
F ie »|., 4y/x 
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Hence, summarising results (4)-(8) and remembering that H;;=1 
identically, we have 


ra : i+k—-jFl 
<E(HiHu)| =- 


if ji and l¥<k 
ap nel hfe J 





. (9) 
Be k—-l 
— E(HiwHn) = — for all k, 1 











(d) Finally, we consider the probability that the product (y;—yi-1) 
(yi1—Yi-2) is negative. This requires that one of the brackets is posi- 
tive and the other negative, and this has probability 


—p/v2 rr 0, 1 
E=2 f f v( -4), 
0 —B/ v2 0, 1 


and, applying (6), suitably modified, we find that the terms cancel and 


We now proceed to calculate (EZ’)?/V for five well-known distribution- 
free tests of randomness, where the null hypothesis is that all the 
observations have come from the same continuous population. In the 
normal regression model, this is equivalent to testing Ho:8=0. 


4, THE DIFFERENCE-SIGN TEST 


The test criterion, proposed by Moore and Wallis [8], is simply the 
number of positive first differences in the series. Stuart [12] has made 
explicit power calculations against normal regression alternatives. The 
test statistic is 


D= b> Hi; i. 


i==2 


s Ae 
eee 24/r 


Also, as is easily shown (Stuart [12)]), 


1 
V(D) = Lb (n + 1). 
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So for D, 


E’)? 
(2) ~ 3n/r. (11) 


V 


5. KENDALL’S BANK CORRELATION TEST 


Mann [7] suggested the use of Kendall’s rank correlation coefficient, 
t, for testing randomness. We shall consider the quantity Q related to { 
by 

t 40 
n(n — 1) 
Q may be defined by 
Q= 2 Ais, 
<j 

i.e. we take all possible 4n(mn—1) comparisons between pairs of ob- 
servations, and score unity whenever an observation exceeds a later 
observation. 

From (4), 


~.2,- awe 
i<j 


n(n? — 1) 
pa ae , 
Also (see, e.g., Kendall [6]) 
V(Q) ~ n3/36 
so that for Q, 
er ~ n3/(4r). (12) 
6. SPEARMAN’S RANK CORRELATION TEST 


Spearman’s rank correlation coefficient, r,, can clearly be used 
wherever ¢ can, and has been considered by Daniels [2] as a test against 
trend. We shall consider the quantity V related to r, by 


12V 
n(n? — 1) 


r=l1- 
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It may easily be shown (see Durbin and Stuart [3]) that V may be 
defined analogously to Q by 


Ve= 2 (j = 1)Hi;, 
i<j 


ie. V is a weighted sum of the H-scores, the weights being the distance 
separating the observations contributing the score. Now 


E(V) = DG - DEH} 


and, by (4), 


Also (see Kendall [6]) it is easily shown that the variance of V is 
asymptotically n5/144. 
Thus 


(E’)? 
i n®/(4r), 
just as for Kendall’s test. 


7, THE TURNING POINT TEST 


Another test, proposed by Wallis and Moore [16], consists in counting 
the number of runs up and down in the series or, equivalently, the num- 
ber of peaks and troughs in the series. This statistic was considered 
earlier from another point of view by Bilham [1]. We score 


T. - if (ys — yi) (Yi — yir) < 0 
y 0 otherwise, 


and the sum of the (n—2)T7;; is our statistic 7. Now we have seen in 


equation (10) that 
dE(T; 
| ( | wit 
OB J pmo 


Thus, for 7’ also, 
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E’ =0 (14) 
and the relative efficiency of the test is zero. 


8, THE RANK SERIAL CORRELATION TEST 
Since H;;=1, the rank of y; among the n y’s is 


r= 2; Hi;. 


j=l 


Consider 


Tre = > , AisHn 


j=l lol 


=> > Aitu + DS WiHe 


int lak lyk 


+ >) AisHu + HicHe. 


int 


From (9) we obtain immediately 


E Bre] = ast LL GFE -FFD 





4/r 
+2> (k-D +25 «-a} 
luk juni 
_ n(n + 1) 
4/3 
Now the rank serial correlation coefficient of lag s is, neglecting con- 
stants, 


ji lek 


(¢(+k—n-+i). (15) 


W= y TP ite (16) 


i=l 


when a non-circular definition is used, or 
W! = Wt Dd tai, | (17) 
t—l 


when a circular definition is used. These statistics are special cases, 
using ranks of the serial correlation coefficient proposed as a distribu- 
tion-free test against trend by Wald and Wolfowitz [15]. 

If in (15), we put k=i+s, the non-constant factor becomes 
(2i—n—s+1), and since 





DISTRIBUTION-FREE TESTS OF RANDOMNESS 


> (i -n—s+1) =0, 
ton] 
we see from (16) that 


[= EW) | Seo (18) 


Similarly, if we put k=n—s-+7 in (15), the non-constant factor be- 
comes (27—s+1), and since 


> (2 -s4+1 <0, 


i=l 


we see from (17) and (18) that 


E zw’) ts 0. (19) 


Thus, for any lag, the circular or non-circular rank serial correlation 
coefficient has relative efficiency zero as a test against normal regres- 
sion. 


9. COMPARISONS AND CONCLUSIONS 


Collecting our results (3), (11), (12), (13), (14), (18) and (19), we 
obtain, using (1), the following table for the asymptotic relative effi- 
ciencies of the tests: 








Asymptotic Relative Efficiency 
Test 





Compared to b Compared to D 





Regression coefficient (b) y 1 
Spearman’s test (V) 3/ax = .95 
Kendall’s test Q) 3/x = .95 
Difference-sign test (D) 0 
Turning point test (T) 0 
Rank serial correlation test (W) 0 





The rank correlation tests are highly and equally efficient, agreeing 
with the results of Daniels [2], who considers a more general model. 
Our value 3/x is not identical with the value 9/2? established for their 
efficiency as tests of bivariate independence by Hotelling and Pabst 
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[5] and by Moran [9]: here we have been dealing with an essentially 
univariate problem, and since the null hypothesis is one of independ- 
ence, it is not surprising that we get for our efficiency the square root 
of the bivariate efficiency. (An “explanation” of this result is given by 
Stuart [13].) When we take into consideration the computing time for 
each statistic, Spearman’s test is to be preferred, the formula 


V= - , (r; - 1)? 
2 int 
giving it a clear advantage over Kendall’s test, especially for large n, 

As will be seen from the table, all three other tests have asymptotic 
relative efficiencies of zero, although, as the last column shows, D is 
to be preferred to the other two, which have zero relative efficiency for 
any value of n. (The result for W has previously been obtained by 
Noether [10].) Care must be taken in interpreting these results, for, 
since the tests are all consistent against this alternative hypothesis, we 
can always make the power of any of them as close to 1 as we please by 
increasing sample size indefinitely. The situation is analogous to (and, 
as shown in section 1 above, a reflection of) that arising with a con- 
sistent, but highly inefficient, estimator. The relative efficiencies which 
we have calculated are local properties, in the sense that they are re- 
stricted to neighbourhoods of the null hypothesis which are small 
enough, when sample size is taken into account, to keep the powers of 
the tests bounded away from unity. Reference should be made here to 
a sampling experiment reported by Foster and Stuart [4] which closely 
bears out the results of the table above. 

While D is very much simpler to compute than any of the other sta- 
tistics, this fact does not close the gap between it and V. For (11) and 
(13) show that V has an efficiency advantage of order n?, while if 
computing time is proportional to the number of comparisons to be 
made between observations, the advantage to D with (n—1) compari- 
sons, as against 4n(n—1) for V, is only of order n. 


10. SUMMARY 


It is shown that against normal regression alternatives, the two rank 
correlation tests are to be preferred to three other distribution-free 
tests, these being the difference-sign test, the rank serial correlation 
coefficient test and the turning point test. Further, on computational 
grounds alone, Spearman’s rank correlation test is to be preferred to 
Kendall’s test. These results do not, of course, apply to other alterna- 
tive hypotheses, such as the presence of serial correlation. 
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ESTIMATION OF THE POISSON PARAMETER FROM 
TRUNCATED SAMPLES AND FROM CENSORED 
SAMPLES* 


A. C. Conen, Jr. 
University of Georgia 


Maximum likelihood estimators of the Poisson parameter 
applicable to both truncated and censored samples are de- 
rived in this paper. Singly and doubly truncated samples as 
well as singly and doubly censored samples are considered. 
The estimators obtained are presented in simple algebraic 
forms and their application to practical problems with the 
aid of standard Poisson tables is illustrated with numerical 
examples. Asymptotic variances of estimates for the different 
cases considered are obtained from second derivatives of the 
likelihood functions and are simplified to forms which permit 
ready evaluation. 


1, INTRODUCTION 


HE Poisson distribution is an appropriate mathematical model for 
Leeds such diverse classes of discrete data as haeroocytometer 
counts of blood cells per square, the number of noxious weed seed per 
unit of field seed, and the number of defects per unit of a manufac- 
tured product. It is thus of interest to the biologist, the agronomist, 
and the quality control engineer as well as to research workers in vari- 
ous other fields of scientific endeavor. When sample observation is per- 
mitted over the full range of the complete distribution, the estimation 
problem is quite simple. In that case, the maximum likelihood estimate 
of the population parameter is the sample mean. When the sample is 
truncated or otherwise restricted, as for example when the number of 
zero observations is unknown or when observations of higher counts are 
pooled, the estimation problem increases in complexity. Various aspects 
of estimation involving singly truncated and singly censored Poisson 
samples with known terminals have been considered by Tippett [7], 
Bliss [1], Rider [5], David and Johnson [2], and by Moore [4]. Accord- 
ing to terminology which has recently come into popular usage, trun- 
cated samples are understood to be those from which the number of 
observations eliminated by the restricting process is unknown. Cen- 
sored samples are those in which the total number of sample specimens 
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is known, but measurements on some of this number is lacking. Cen- 
sored samples may thus be regarded as truncated samples having a 
known number of unmeasured (missing) observations. In this paper 
and in the references cited, interest lies only with samples truncated 
or censored such that all observations above or below specified termi- 
nals are either eliminated or unmeasured ; that is, with samples in which 
the restriction applies only to the tails of the sample. The classification 
of single or double indicates whether one or both tails have been re- 
stricted. Tippett obtained the maximum likelihood estimator for a sam- 
ple that is singly censored on the right, but left his results in a some- 
what unwieldy form for practical application when more than four of 
the individual frequency classes are available. For four or less frequency 
classes, he provided nomograms to aid in computing the required esti- 
mates. Bliss developed an approximation to Tippett’s estimator and 
provided two tables necessary for applying his procedure. Moore was 
also concerned with this case and developed an estimator based on 
sample moment functions. Rider developed an estimator based on 
moment functions for the case of samples singly truncated on the left 
and also considered maximum likelihood estimation for the same case. 
David and Johnson were likewise concerned with maximum likelihood 
estimation in this latter case when the zero frequency class is the only 
one missing. The present paper is more general and of greater extent 
than the references cited above. It is concerned with maximum likeli- 
hood estimation from singly and doubly truncated samples as well as 
from singly and doubly censored samples, all with known terminals. 
Estimators derived here are expressed in simple algebraic forms for 
easy application to practical problems. 


2. THE POISSON DISTRIBUTION 
The complete Poisson distribution function may be expressed as 


e-™mz 
(1) f(z, m) = ; ’ x=0,1,2,---, 
x 


where f(x, m) is thus the probability of observing exactly x occurrences 
of the event studied. The cumulative probability that c or more occur- 
rences will be observed, may be written as 


(2) P(c, m) = > —— . 


pr | 
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- As stated in the introduction, the maximum likelihood estimate of m 
based on a complete (unrestricted) random sample is given as 


(3) m= 2= > 2i/n, 


where n is the total number of sample observations. In what follows, 
corresponding estimators are derived for various types of truncated 
and censored samples. Where no confusion will likely result, the nota- 
tion is simplified by writing f(x) and P(c) in place of the longer f(z, m) 
and P(c, m). 


3, TRUNCATED SAMPLES—-NUMBER UNMEASURED (MISSING) 
OBSERVATIONS UNKNOWN 


Doubly truncated. The probability function of a Poisson distribution 
truncated on the left at =c and on the right at x=d, may be written 
as 

= 0, z<e, 

f(a) asain 
(4) f(z) = [P(c) — P(d@+1)] yee ex2<d, 
x 


f(z) = 0, > d. 


The truncated distribution (4) is thus normalized so that 


o d 
(5) LJ(z) = f(z) = 1. 


The likelihood function of a random sample of n observations from a 
population distributed according to (4) may be written as 


(6) P(a1, %2,°++,2%n) = [P(c) — P(d + Demet | I=.t] . 


We obtain this same likelihood function when we consider the popula- 
tion as being complete with frequency function (1) and consider the 
sample as being truncated with the restriction that sample observation 
can include neither count nor measurement beyond the truncation 
points. In general, the writer prefers to view the truncation or other 
restrictions as being imposed on the sample rather than on the popula- 
tion. 
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Taking logarithms of (6) and writing nz in place of > ?x,, where £ 
is thus the mean of the truncated sample, we have 


(7) L=—n In [P(c) —P(d+1)]—nm+nz In m—In | II]. 


Differentiating (7) and equating the result to zero, we obtain the es- 
timating equation 
_ Se-1)-f@ _ 

P(c) — P(d + 1) 





(8) 


For a clearer understanding of how (8) was obtained from (7), the 
following details are included on determining dP(c)/dm. From (2), we 


have 
a eae e—-™m* 
d d 
dP(c) [= xz! ° | zx! | 
- ~ 5—=-, 





dm dm ~ dm 


osm" — "me" 
| z! 





e~™m=- 1 C) e~™m= 


_— , 


(x — 1)! " x! 


e~ m™m* > e~-™m* 


= P(c — 1) — P(e), 
1 x! e x! 


and finally 


dP 
(9) © = fle~ 0. 
dm 

With the aid of a set of Poisson tables such as those of Molina [3], 
equation (8) can be solved for the required estimate, m, by elementary 
iterative procedures, one of which is illustrated in Section 7. 

Singly truncated on the left. For a sample that is singly truncated on 
the left, the estimating equation (8) becomes 


+ ee ws 


(10) ho" 


since here, do, limg..f(d)=0, and limg..P(d+1)=0. With c=1, 
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estimating equation (10) is applicable in the special case of an unknown 
number of zero observations, the case considered in [2]. 

Singly truncated on the right. In this case, c=0, f(c—1)=0, and 
P(c) =1. Accordingly the estimating equation becomes 


S(d@) is 
1 — P(d + 1) 


4. CENSORED SAMPLES—-NUMBER UNMEASURED OBSERVATIONS 
IN EACH TAIL KNOWN 


(11) 1+ 





Doubly censored. Let n, and nz be the number of unmeasured ob- 
servations in the left and right tails respectively and let n be the num- 
ber of measured observations for which cS2<d. The likelihood func- 
tion for a sample of this type drawn from the population (1) is 


P(x, Da ***, Satarias) 


my 2a 
= K[1 — Po)" [| [P@ +1)", 
i! +++ 2! 
where K is a constant, and other symbols are as previously defined. 
Taking logarithms of (12), differentiating with the aid of (9) and equat- 
ing to zero, we obtain the estimating equation 


(12) 


(13) ar roe Rha ah fe f(d) fe). 


1 — P(e) P(d + 1) 


Singly censored on the left. In this case n.=0, and the estimating equa- 
tion (13) becomes 


(14) hI... “(a—]- 0. 
n dm ™ 1 — P(e) 


When c=1, a singly censored sample with the number of unmeasured 
observations known is actually a complete rather than a restricted 
sample since m, is simply the number of zeros in the sample and the 
total sample size is n+. In this case, equation (14) becomes 


D 2 er m[f) 


nm —s m Lf(0) J 


n dm m 


a > =z: 
m= = Z, 
n+n 
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which agrees with equation (3) for a complete sample. 
Singly censored on the right. In this instance, n, =0, and the estimating 

equation (13) becomes 

1 db # mf f(d) 
(5) — n= - 14 2[ | -0 

n dm m™ n LP(d + 1) 
This is recognized as the appropriate estimating equation for the case 
in which all sample observations for which x>d have been pooled. 
Tippett’s estimator (loc. cit.) applied to this case. 


5, CENSORED SAMPLES—-TOTAL NUMBER UNMEASURED OBSERVATIONS 
KNOWN BUT NOT THE NUMBER IN EACH TAIL SEPARATELY 


Let c and d designate the terminals as in Sections 3 and 4. Let n be 
the number of measured observations for which cSzSd, and ny the 
combined number of unmeasured observations in the two tails. The 
likelihood function for a sample of this type from the population (1) 
is then 


P(21, 22, ++ * » Tatng) 

= = K[l — P(c) + P(d+ p{—_]. 
x! Po Z,! 

Taking logarithms, differentiating and equating to zero, we have the 


estimating equation 





=f f(@) — fe - 1) ]-° 


1 — P(c) + P(d + 1) 


We note that the singly censored cases in this instance are identical 
with those of Section 4. When censored on the left, m»>=m1, limg..f(d) 
=0, and limg...P(d+1) =0. Accordingly, (17) assumes the same form 
as (14). When censored on the right only, no=n2, c=0, f(c—1) =0, 
P(c) =1, and (17) reduces to the same form as (15). 


n 


6. VARIANCE OF ESTIMATES 


Since maximum likelihood estimation has been employed, the 
asymptotic variance of m can be expressed as 


(18) Var (m) = — Fa % 2 


dm? ~ 


mam 


Second derivatives for the various cases considered are given below. 
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TRUNCATED SAMPLES—NUMBER MISSING 
OBSERVATIONS UNKNOWN 


Doubly Truncated 





[= — 2) — fle — 1) — Jid — 1) “ie 
P(c) — P(d + 1) 

f(e-— 1) -f@ 7 

Paes pan 





Singly Truncated on Left 





fle — 2) — fle — 1) fle — 1)7? 
Pay | 


P(c) 








f(d — 1) — fd) f(a) : 
+ 1 — P(d +1) 1+] -Sara | 


CENSORED SAMPLES—-NUMBER OBSERVATIONS 
IN EACH TAIL KNOWN 


m[f(c — 2) — fle — 1) fie -—1)\? 
| 1 — P(c) ” (: — a) | 


+M[ 9-88 _ (£8 )' 
n P(d + 1) P(id@+1)/ 1 
m[f(e — 2) — fle — 1) f(e-1)\? 

| 1- Po ss ee 


t=  mf[f(d — 1) — f(d) Sad \ 
+ | Pd+1) fae 5) | 


Doubly Censored 








Singly Censored on Left 
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CENSORED SAMPLES—-COMBINED NUMBER 
OBSERVATIONS IN TAILS KNOWN 


z +u[X — 1) -f@ — fle — 2) + fle — 1) 
1 — P(c) + P(d + 1) 


-( eis )] 
1- P(@) +Pa@+/) J 
Derivatives given in this section can be evaluated with the aid of 
Molina’s Tables as previously mentioned. 





n 





7. PRACTICAL APPLICATIONS 


For each case considered in this paper, the estimating equation can 
be solved for m without too much difficulty by a simple inverse inter- 
polative process, provided a set of Poisson tables such as those of 
Molina [3] are available. As a first approximation to m the sample mean, 
#, or some obvious modification thereof, will prove satisfactory in 
many instances. Where applicable, estimates given by Moore (loc. cit.) 
or by Rider (loc. cit.) may provide closer first approximations. The 
illustrations which follow serve to clarify these points. 

The data of Table 1, due to Rutherford and Geiger [6], will be used 
to illustrate how estimates are calculated from samples. These data 
concern the number of a particles observed in an eighth of a minute 
time interval. Observations were recorded for 2608 such intervals, with 
x designating the number of particles observed during an interval, and 
fo(x) designating the frequency or number of intervals during which 
these observations were made. Observations of nine or more particles 
per time interval were pooled. Various cases considered in this paper 
are illustrated with these same data by appropriately changing basic 
assumptions regarding the sample. 


TABLE 1 








Particles per Interval, z 2 3 












































Number of Intervals, fo(z) 383 





Illustration 1 


Sample singly censored on the right. We first use the maximum in- 
formation provided by the sample which includes n=2565 measured 
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observations and n.=43 unmeasured or pooled observations in the 
sample right tail. The terminus is at d=8, and ) 8.ozfo(x) = 9683, 
Accordingly, = 9683/2565 =3.7750. The appropriate estimating equa- 
tion is (15) which we solve, using as a first approximation, m,=Z=3.8 
(rounded off). From Molina’s tables, we have f(8, 3.8) =0.024123 and 
P(9, 3.8) =0.015984. On substituting these values in (15) we obtain 


1 dL _ 3.7750 43 Perera 


n dm\mas 3.8  - + 256510.015084 





| = + 0.01872. 


Similarly, we compute 
1 db | _ 3.7750 ‘ 43 Fer 
n dm\mus9 3.9 2565 LO.018533 
To determine the required value, m, for which (1/n)dL/dm=0, we 
interpolate linearly! as summarized below. 
1 dL 
m —__ _—_——_— — 
n dm 
3.800 +0.01872 
3.871 0.00000 


3.900 —0.00775 





] = — 0.00775. 








Thus our estimate is m=3.871, and using (18) and (24), we compute 
o~ = /V(m) ~ 0.04. 


Illustration 2 


Sample singly truncated on the right. Here, we neglect nz and assume 
that the number of observations in the truncated tail is unknown. 
Otherwise the sample remains the same as for illustration 1. Estimating 
equation (11) is applicable in this instance, and on substituting the 
necessary values, we have 


1 dL _ 3.7750 0.024123 
n dm\muss 3.8 1 — 0.015984 
1 dL 3.7750 0.026869 


ne Gown = ~i4 = — 0.00468. 
n dm\muss 3.9 1 — 0.018533 





= + 0.01794, 











1 More precise interpolation formulas involving second and higher order differences might be re- 
quired to give the desired accuracy in some applications. 
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Interpolating as in the previous illustration, we have m=3.879, and 
from (18) and (21), ca~0.04. 


Illustration 3 


Doubly truncated sample. For this illustration, we arbitrarily elimi- 
nate the first two classes of Table 1 in addition to the pooled classes for 
measurements greater than 8, and assume the number of missing ob- 
servations to be unknown. In this instance the terminals are c=2 and 
d=8. Completing the sample summary, we have > $_.zfo(z) = 9480, 
n=2305, and #=9480/2305 =4.112798. The appropriate estimating 
equation is (8) and from Molina’s tables, we find f(1, 3.8) =0.085009, 
(8, 3.8) =0.024123, P(2, 3.8) =0.892620, P(9, 3.8) =0.015984, f(1, 3.9) 
= 0.078943, f(8, 3.9) =0.026869, P(2, 3.9) =0.900815, and P(9, 3.9) 
= (0.018533. On substituting these values in (8) we have 


_ 4.112798 1 -0.085009 — 0.024123) 


ma3.8 3.8 0.892620 — 0.015984_ 
= + 0.01286, 
2 4.112798 si -0.078943 — 0.0268697 
3.9 .0.900815 — 0.018533_ 
= — 0.00446. 
Interpolating, we have m =3.874, and from (18) and (19), ca~0.05. 


Illustration 4 


Sample doubly censored with number unmeasured observations in each 
tail known. Data for this illustration are the same as for illustration 3 
with the added information that n,; =260, and n.=43. The applicable 
estimating equation is (13), and on making necessary substitutions, we 
obtain 




















n dm | mas.s 3.8 ~ 2305 
43 [0.024123 
2305 E 015984 
_ 4.112798 =f 0.078943 
1 — 0.900815 


1 db 4.112798 1 =a 0.085009 | 


1 — 0.892620 





| = + 0.02117, 


1 db 


n dm bs 3.9 ~ 2305 
43 — 








] = — 0.00817. 
2305 LO.018533 
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Interpolating, we find m=3.872, and from (18) and (22), cm~0.04. 


Iliustration 5 


Sample doubly censored with total number unmeasured observations 
known, but not number in each tail separately. For this illustration we as- 
sume the knowledge that no = 303, but that n, and nz separately are un- 
known. Otherwise, the data remain the same as for illustrations 3 and 
4, Estimating equation (17) is applicable, and after making the neces- 
sary substitutions, we have 


1 dL 4.112798 303 [ 0.024123 — 0.085009 7 


= ————_ - 1+ 
n dm | mas.s 3.8 2305 L1 — 0.892620 + 0.015984. 
+ 0.01744, 
4.112798 303 F 0.026869 — 0.078943 7 


3.9 . 2305 L1 — 0.900815 + 0.018533 _ 
= — 0.00359. 


Interpolation yields 7m =3.883, and from (18) and (25), c@~0.05. 

We dispense with illustrations of the remaining sample types since 
solution of applicable estimating equations can be accomplished in the 
same manner as in the five illustrations presented. To solve estimating 
equations for any of the cases considered in this paper, standard itera- 
tive procedures such as Newton’s method could be employed, but for 
simplicity and ease of application, the interpolative procedure illus- 
trated seems preferable. 
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TABLES OF THE EXPECTED VALUE OF 1/X FOR 
POSITIVE BERNOULLI AND POISSON VARIABLES 


Epwin L. Gras AND I. RicHarp SAVAGE 
National Bureau of Standards 


SUMMARY OF TABLES 


HE random variable X is said to have a positive Bernoulli distribu- 
tion [11]! if the probability that X =z is equal to (2)p*q"-*(1—q")-! 
forx=1, 2, ---,n where g=1—p and 0<pSl. Similarly the variable 
X is said to have a positive Poisson distribution if the probability that 
X=z is equal to e~™(1—e—")—'m*/z! for z=1, 2, -- -, and m>0. 
Table I gives the values of E(1/X|n, p) to five decimal places where 


Q) BA/X|n,2)=0-eL(")eve @= 1-2) 
for the following values of the parameters: 
n= 2(1)20; p = .01, .05(.05).95, .99 
n = 21(1)30; p = .01, .05(.05).50. 
Table II gives values of E(1/X|n) to five decimal places, where 


(2) E(1/X | m) = e-™(1 — e-™)-1 m*/(cla), 


for the following values of the parameter: 
m = .O1, .05(.05)1.0(.1)2.0(.2)5.0(.5)7.0(.1) 10(2)20. 
PREPARATION OF TABLES 


Table I was originally prepared directly from (1) using Tables of the 
Binomial Probability Distribution [9]. Subsequently the table was 
checked by using the recurrence relation 


1 q” n 
@)  B0/X|9+1,9) = > +47 BO/z ls 


Table II was prepared directly from (2) making use of tables of the 
Poisson distribution [4], [5], [6]. Some of the values were checked by 





1 This article by Stephan contains much materia! of interest related to the contents of this paper 
In particular it gives a precise formulation of the mathematical situation in which these tables are ap- 
plicable in sampling problems. Also it presents many more results of interest. 
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the use of 
(4) E(1/X|m) = [Ei(m) — y — log. mJe™/(1 — e-), 


making use of [7], [8], and [10]. 
The entries in Tables I and II were obtained by two independent 
computations to insure five-decimal accuracy. 


USE OF TABLES 


The following situation often arises in sampling problems. 
An observation y; (¢=1, - - - , x) is made on each of z individuals and 
the average, 


Y=(y+y2+:--- + ys)/2, 


is computed. 

If the y;’s are independent observations on a random variable with 
mean value » and variance o?, we find that the mean value of Y is x. 
However, if x, the sample size, is a random variable, then the variance 
of Y is not o?/z but is ?£(1/X). Thus one use of the tables is in finding 
the variances of means when the sample sizes are random variables 
with either positive Bernoulli or Poisson distributions. Extensive dis- 
cussions of the above situation can be found in [2], [3], [11], and [12]. 

Some typical situations where X has a positive Bernoulli or Poisson 
distribution are: : 

1) In estimating the average number of acres per farm planted with cotton, 

of those farms of a sample having any cotton planted. 

2) In estimating the average weight of animals that will survive a certain 

experiment, where the probability of an animal dying is constant for each 
animal and independent of the other animals. 


3) In estimating the average cost of fires in a certain city by examining the 
cost of all fires that occurred in a short time interval. 


In using the tables one must have good estimates of p or m. How- 
ever, the above examples are typical in that they cover situations where 
one is likely to have good estimates of the important quantities, i.e., 
proportion of farms growing cotton, the lethality of an experiment, and 
the average number of fire alarms per day. 

In some sampling problems the sample size follows other discrete 
distributions, such as the hypergeometric. Tables of E(1/X) for this 
distribution would be difficult to prepare since the distribution itself 
is so poorly tabulated and since there are three parameters involved. 
Hence, if in dealing with this distribution one feels that the binomial or 
Poisson approximations are not adequate, then one could perform the 
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desired computations for the specific situations at hand. For some dis- 
crete distributions the formula for E(1/X) is simple, as for example the 
Geometric distribution. 


APPROXIMATIONS 


Stephan [11] gives several methods for computing (1), but none of 
these gives @ simple approximation for the entire range of parameters 
covered by Table I. His approximations are advantageous to use for 
larger values of n than those covered by the present tables (See his ex- 
amples). 

Finkner [3] suggests for large values of np that the following rela- 
tionship will hold: 


(5) 1/np < E(1/X|n, p) < 1/(np — 1). 


In preparing Table I it was noted for large values of np that a very good 
approximation to E(1/X|n, p) is given by 


(6) 1/(np — q). 


We are told by one of the referees that (6) also appears in an unpub- 
lished manuscript of W. A. Hendricks. The bounds in (5) and (6) suf- 
fer from the disadvantage that there is no theory as to when they are 
good approximations. On the other hand it is clear that sometimes they 
are poor since 1/(np—1) can take on negative values and 1/np, 
1/(np—q), and 1/(np—1) can take on values larger than one. 

As an immediate consequence of the Schwarz inequality we find for 
any positive random variable that the following inequality is valid: 


(7) E(1/X) 2 1/E(X). 


This inequality in the case of the positive Bernovlli and Poisson distri- 
butions becomes 


(8) E(1/X | n, p) = (1 — g*)/np 

and 

(9) E(1/X| m) > (1 — e™)/m, 

respectively. In (8) the equality holds if either n=1 or p=1. 
Using the inequality 

(10) 1/x S 1/(4 + 1) +: 3/(@ + 1)(@ + 2), 


which is valid for x21 we have for random variables which cannot take 
on values less than one the following inequality: 
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(11) E(1/X) s E[1/(X + 1)] + 8E[1/(X + 1)(X + 2)]. 


In the case where X has a positive Bernoulli distribution this gives 


E(1/X| np) < [p(n + D0 — ))} [0 — Pj n +1, p)) 
+ 3(1 — P(2|n + 2, p))/(n + 2)p] 


where P(z | r, p) is the probability that a Bernoulli variable with para- 
meters r and p will be less than or equal to 7. Finally when X has a posi- 
tive Poisson distribution we obtain 


E(1/X|m) s [(1 —.e-™)m]—[(1 — P(1| m)) 
+ 3(1 — P(2| m))/m], 


where P(i| m) is the probability that a Poisson variable with parameter 
m will be less than or equal to 7. 

This paper is primarily concerned with obtaining exact values of 
E(1/X|n, p) and E(1/X|m). Techniques for finding the limiting dis- 
tributions and moments of reciprocals of positive Bernoulli and Pois- 
son random variables are available in sections 27.7 and 28.4 or [1]. 


(12) 


(13) 


INTERPOLATION AND EXTRAPOLATION 


One entry in each column of Table I bears an asterisk. For values of p 
equal to or larger than the one corresponding to the entry with an 
asterisk, the approximation 1/(np—gq) is accurate to at least two deci- 
mal places and usually to two significant figures; and, in general, if 
np> 10, it has been found that 1/(np—q) gives at least two-place ac- 
curacy. These statements are empirical. Although it has not been pos- 
sible to derive these facts mathematically they have been observed in 
many computations. 

If n<10 and p is above the asterisk, linear interpolation gives re- 
sults accurate to two decimal places. 

For the cases not covered in the two preceding paragraphs more 
complicated interpolation formulations might be advantageous to use. 
In particular when np or m are greater than 5 it has been noted that 
interpolation of the functions pE(1/X|n, p) and mE(1/X|m) give bet- 
ter results than linear interpolation. 

For values of n greater than 30, it has been found that if np <2 one 
may set np=m and use Table II. This method seems always to yield 
at least two significant figures. 

Formula (3) is easy to apply and may be found useful for extending 
Table I to other values of n as needed. 
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TABLE II 
E(1/X|m) =e—™(1 —e-™)—! 2, m*/(a!2z) 








E(1/X|m) 


3 


E(1/X|m) 





-99750 
.98754 
.97514 
- 96282 
-95058 
-93842 
-92636 
-91435 
.90244 
-89062 
.87889 
.86725 
.85571 
.84426 
.83292 
.82166 
.81052 
. 79948 
. 78854 
.77771 
- 76699 
. 74587 . 14689 
. 72520 . 12776 
. 70499 . 11302 
-68523 -09190 
-66594 .07749 
.64712 -06702 
.62878 -05906 
-61019 -05280 


-59351 
.57659 
.54417 
-51361 
-48488 
-45792 
-43268 
-40909 
-38707 
.36654 
.34742 
-32963 
-31308 
- 29770 
- 28340 
-27012 
-25777 
23055 
- 20779 
. 18866 
. 17249 


NMOTATE EPR ROWWHOWNHNNNN 
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= 
oO 


a. 
3. 
1.20 
1.30 
1.40 
1.50 
1.60 
1.70 
1.80 








STATISTICAL ABSTRACTS 


This section, an experiment, will present abstracts of articles on statistical 
methods. Such articles appear in numerous journals and it is the aim of this 
section to call attention to them by brief summaries. The object of each 
abstract is not a critical review of an article but a statement in as nonmathe- 
matical terms as possible of the statistical problem considered, the character 
of the methods used to solve it, and the results obtained. 

Certain journals which normally contain articles of statistical interest will 
be abstracted regularly. These are American Journal of Public Health, 
Annals of Eugenics, Annals of Mathematical Statistics, Biometrics, Biometrika, 
Calcutta Statistical Bulletin, Econometrica, Human Biology, Journal of Agri- 
cultural Sciences, Journal of the Royal Statistical Society—Series B, Psycho- 
metrika, Sankya, and Sociometry. Most articles on statistical methods 
appearing in these journals in 1953 or later will be abstracted. Papers on 
statistical methods published elsewhere will be included as they come to the 
attention of the Abstracts Editor; readers are invited to submit to him (at 
the address below) abstracts or suggestions of papers for abstracting. 

The usefulness of the section will depend on the thoroughness of coverage, 
the quality, and the style of the abstracts. Criticisms and suggestions are 
invited. The Department of Statistics of the University of North Carolina 
has accepted the responsibility for the section and will depend not only on 
the faculty and graduate students of the Institute of Statistics but also on 


correspondents from other universities, government, and industry. 

All communications concerning this section should be addressed to the 
Abstracts Editor, Professor George E. Nicholson, Jr., Chairman of the De- 
partment of Statistics, University of North Carolina, Chapel Hill, North 
Carolina. 


Abelson, R. P., “A note on the Neyman- 
Johnson technique,” Psychometrika, 18 
(1953), 213-18. 

In a multivariate prediction problem, 
for what values of the predictor variables 
will two groups differ significantly on the 
criterion variable? The region of significance 
of the Neyman-Johnson technique is de- 
fined as the set of points of the predictor 
space where one group is significantly better 
than the other on the criterion variable. 
These latter authors have provided an ana- 
lytic definition of this region for the case of 
three predictor variables. The present au- 
thor generalizes the solution to any number 
of predictors. A ratio approximating the 
generalized region of significance is pro- 
posed and this ratio is shown to be asymp- 
totically equivalent to the expression ob- 
tained by Neyman and Johnson. The deri- 
vation given by the author is a straightfor- 


ward application of the critical ratio princi- 
ple to the difference between predicted cri- 
terion scores for the two groups. The order 
of the approximation of the ratio given here 
to the Neyman-Johnson ratio is that of the 
order with which the Beta distribution is 
approximated by the F distribution. B. J. 
Winer, University of North Carolina. 


Bartlett, M. S., “The statistical significance 
of odd bits of information,” Biometrika, 
39 (1952), 228-37. 

The author presents a method of pooling 
information from n independent events to 
test a given hypothesis, H. If each event is a 
dichotomy, the total information if tn 
= — > log p;, where 7; is the probability of 
the event occurring, given H. The expected 
value and variance of i, are E(t») =In 
= — Xi(pilog pi +9: log a); ain? = Zipigi [log 
(pi/qi)]. All logarithms are to the base e. 
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H is tested by considering the ratio (in 
—I,)/ci, 28 & normal deviate. The results 
are extended to a multichotomy. An exam- 

ple is presented. Some observations are 
made cn the use of this method for certain 
types of dependent observations. A brief 
discussion is given on the construction of 
confidence limits when the estimator is the 
ratio of random variates. R. L. Anpzrson, 
North Carolina State College. 


Bliss, C.1., “Fitting the negative binomial 
distribution to biological data”; Fisher, 
R. A., “Note on the efficient fitting of the 
negative binomial,” Biometrics, 9 (1953), 
176-200. 

See Fisher, R. A. 


Cox, D. R., “Estimation by double sam- 
pling,” Biometrika, 39 (1952), 217-27. 

Double sampling methods are developed 
to obtain a large sample estimator, t, of 
some parameter, 9, both for known and 
unknown population variance. The vari- 
ance of ¢ is assumed to be some function of 0, 
a(6), given in advance. The results are ap- 
plied to the estimation of normal and bi- 
nomial means, when a(@)=a or a, Es- 
timation by confidence intervals and a com- 
bined estimation and testing procedure are 
also considered. R. L. Anperson, North 
Carolina State College. 


Dixon, W. J., “Processing data for outliers,” 
Biometrics, 9 (1953), 74-89. 

The problem considered is how the mean, 
uw, and standard deviation, ¢, should be es- 
timated where N observations from a nor- 
mal population N(u, ¢) may actually con- 
tain some small unknown proportion y of 
observations from a “contaminating” popu- 
lation with a different standard deviation, 
N(u, 20"), or from one with a different 
mean, N(u+)Ae, o*). The results in the 
paper are stated for sample sizes N=5 and 
N=15. The numerical results are based in 
large part on experimental sampling. The 
behavior of mean and median (as estimates 
of u) and of S* (estimate of o”) and of s 
and the range (as estimators of a) is inves- 
tigated for various values of \ and y. The 
effect of the estimators of processing the 
data for outlying observations using vari- 
ous levels of significance is explored. For the 
whole range of \ and +, and for sample sizes 
N=5 and N=15, recommendations are 
made as to the level of significance to use in 
processing data for outliers, and which es- 
timates to employ. Lixcoun Mosss, Stan- 
ford University. 
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Draper, J., “Properties of distributions re- 
sulting from certain simple transformations 
of the normal distribution,” Biometrika, 39 
(1952), 290-301. 

Given a non-normal variate, z, to be 
transformed to a normal z with unit vari- 
ance. The following transformations are 
considered: Sz:z=y+6 log (z—); Sy:z 
=7+6 sinh y; Sg:z=y+6 log [y/(1—7) ], 
where y=(zx—£)/A. Methods of estimating 
the parameters, 7, 5, —, and A are given for 
Sy. Several examples are given, including its 
use for normalizing ¢ and non-central ¢. A 
quadrature formula is suggested for estimat- 
ing the moments of the Sg system. The 
author states that the calculation of the 
parameters of the Sz system “presents no 
difficulty using the first three moments of 
the distribution of z.” R. L. Anperson, 
North Carolina State College. 


Fisher, R. A., “Note on the efficient fitting 
of the negative binomial”; Bliss, C. L., 
“Fitting the negative binomial distribution 
to biological data,” Biometrics, 9 (1953), 
176-200. 

The negative binomial distribution is a 
two parameter distribution for a discrete 
random variable which gives the papers nd 
of zx occurrences of an event in a samp 
unit as [(k+2—1)1!]/[xl(k— Diva 
+>:)**#]. For limiting values of the parame- 
ters this distribution yields the Poisson or 
the Fisher logarithmic series as special 
cases. The distribution is of wide application 
in biological sciences, having been used to 
describe insect populations, distribution of 
bacterial clumps, accident rates, etc. This 
paper discusses various models leading to 
the negative binomial probability distribu- 
tion, and to various alternative distribu- 
tions more or less related to it. The mean 
of the distribution (which is pk) is efficiently 
estimated by the sample mean z. The par- 
ameter k has been estimated by the method 
of moments, or from the number of zero 
occurrences. Fisher here gives the maxi- 
mum likelihood estimates of the parameters 
together with a convenient arithmetical 
scheme of computation. Tests of fit to the 
model are discussed. ures are il- 
lustrated with numerical examples. Lin- 
coun Mosss, Stanford University. 


Grundy, P. M., “The fitting of grouped 
truncated and grouped censored normal 
distributions,” Biometrika, 39 (1952), 252- 
59. 

A distribution is said to be censored if the 
frequency of observations in the truncated 
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region is known although their values are 
unknown. A process involving adjusted 
sample moments, which is used in connec- 
tion with published tables, is shown to be 
equivalent to maximum likelihood estima- 
tion. In the special case when the group in- 
tervals are equal, approximate formulas for 
the adjusted moments become particularly 
simple. The accuracy of the approxima- 
tions is indicated. Information and co- 
variance matrices are studied from the 
standpoint of effects of grouping. A numeri- 
cal example illustrates the principles. T. W. 
Horner, North Carolina State College. 


Gupta, A. K., “Estimation of the mean and 
standard deviation of a normal population 
from a censored sample,” Biometrika, 39 
(1952), 260-73. 


A sample may be censored in two ways: I. 
observations below or above a given point 
may be censored; II. the (n—k) smallest or 
greatest observations out of a sample of 
size m may be censored. The author was 
concerned with estimating the mean and 
standard deviation of a normal population 
from a type II censored sample. Tables were 
given which facilitate the computation of 
the maximum likelihood estimates and 
their asymptotic variances and covariances. 
Since the maximum likelihood estimates 
may be biased for small n, the best linear 
unbiased estimate was derived. Coefficients 
for finding the best linear estimate of the 
mean and the standard deviation from cen- 
sored data for nS10 are given. An alterna- 
tive unbiased linear estimate, which has 
great efficiency, was proposed for n slightly 
larger than 10. Examples illustrate the 
three methods of estimation. T. W. Hor- 
ner, North Carolina State College. 


Hyrenius, H., “Sampling from bivariate 
non-normal universes by means of com- 
pound normal distributions,” Biometrika, 
39 (1952), 238-46. 

The effect of non-normality on estimates 
of the correlation and regression coef- 
ficients and their variances is studied by 
considering each sample to be from a dif- 
ferent bivariate normal universe. Only 
inequality in the means for the different 
universes is considered in this paper. R. L. 
Anperson, North Carolina State College. 


Jacob, Walter C., “Split-plot half-plaid 
squares for irrigation experiments,” Bi- 
ometrics, 9 (1953), 157-75. 

The objectives of the experiment were to 
study the effects of nitrogen, phosphate, and 
potash (each at 3 levels) on the yield (in 
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pounds) of U.S. No. 1 tubers for three vari- 
eties of potatoes in the presence or absence 
of irrigation; altogether about two and a 
half acres were available in two blocks of 
equal size. All combinations of varieties, 
fertilizers, and irrigation imply 162 differ- 
ent treatments. Since the main effect of 
irrigation was of little interest (but its 
interactions were of interest) the two blocks 
were each divided into two plots, one irri- 
gated, and one not. The 81 treatment com- 
binations to be applied to the four split 
plots were arranged as a 9X9 quasi-latin 
square by confounding portions of second 
and third order interactions with the rows 
and columns of the field. This removal of 
row and column sum of squares resulted in 
about 100% gain in precision. Numerical 
analysis of the data, and the interpretation 
are given. Livcotn Moses, Stanford Uni- 
versity. 


Johnson, N. L., “Approximations to the 
probability integral of the distribution of 
range,” Biometrika, 39 (1952), 417-19. 

Given a random sample of n from F(z). 
Approximate formulas for the probability 
of the range not exceeding w are given. Com- 
parisons are made with exact probabilities 
and significance levels for F(z) normal. 
R. L. Anperson, North Carolina State 
College. 


Kaplan, E. L., “Tensor notation and the 
sampling cumulants of k-statistics,” Bio- 
metrika, 39 (1952), 319-23. 

Concise formulas are given for sampling 
from infinite populations. It is shown that 
results for multivariate distributions are 
mild generalizations of those for univariate 
relations. R. L. Anpsrson, North Carolina 
State College. 


Kimball, A. W., “The fitting of multi-hit 
survival curves,” Biometrics, 9 (1953), 201- 
11. 

Let a population of organisms be ex- 
posed to a dose of radiation, x. Suppose that 
an organism loses its viability if and only if 
all of n “sensitive units” in the organism are 
inactivated, or hit. Further, assume that 
the probability of any one unit being hit 
is e**, and that hits on various units are 
independent. Then the probability of an 
organism losing its viability is (1—e**)". 
If we write u; for the logarithm of the pro- 
portion of organisms not surviving at dose 
2; then E(u;) = n log (1—e~**#). The parame- 
ters to be estimated are k and n. An itera- 
tive method of solving the (non-linear) least 
squares equations is given. The asymptotic 
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variance-covariance matrix (under normal- 
ity assumptions) is given to permit ap- 
proximate interval estimation. Luycoin 
Moses, Stanford University. 


Kruskal, William H., “On the uniqueness 
of the line of organic correlation,” Bio- 
metrics, 9 (1953), 47-58. 

For some purposes it may be convenient 
to represent a multivariate distribution by a 
single straight line. This paper considers the 
line passing through the mean of the distri- 
bution and having direction numbers pro- 
portional in absolute value to the standard 
deviations, and with their signs determined 
by the signs of the covariances. This is 
called the line of organic correlation. It is 
shown that this is the unique line based on 
first and second moments which transforms 
reasonably under omission of coordinates or 
under change of origin or scale, and which 
also provides the proper directions of asso- 
ciation. If the multivariate distribution is 
normal then the line is shown to maximize 
the probability of correct prediction in a 
certain sense. Certain geometrical proper- 
ties of the line are proved (no assumption of 
normality is made). Problems of sampling 
are not considered. Lincotn Mosss. Stan- 
ford University. 


Kupperman, M., “On exact grouping cor- 
rections to moments and cumulants,” Bi- 
ometrika, 39 (1952), 429-34. 

Corrections for the cumulants are given 
for the rectangular and triangular distri- 
butions; corrections for the mean and vari- 
ance are given for the semi-triangular 
(right-half of the triangular), parabolic, 
and exponential distributions. R. L. Anprr- 
son, North Carolina State College. 


Lancaster, H. O., “Statistical control of 
counting experiments,” Biometrika, 39 
(1952), 419-22. 

Various random experiments were per- 
formed to study the adequacy of the x*-test 
consistency of counts from a Poisson dis- 
tribution with small mean and few counts 
(2-5). The author concludes that, “x? is 
likely to remain the method of choice in 
statistical control of counting, regardless of 
the size of the sample.” R. L. Anpzrson, 
North Carolina State College. 


Leslie, P. H., “The estimation of population 
parameters from data obtained by means 
of the capture-recapture method. II. The 
estimation of total numbers,” Biometrika, 
39 (1952), 363-88. 
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Methods are given for estimating the 
total numbers in a population under as- 
sumptions of constant and varying death- 
rate and dilution of the population. The 
death rate is allowed to vary both in time 
and between different groups of animals. 
Preliminary analysis of a set of data is given 
which provides for a test of the absence of 
dilution and for a method of obtaining ap- 
proximate estimates from a long chain of 
samples. L. D. Catvin, North Carolina 
State College. 


Lord, Frederic M., “An application of con- 
fidence intervals and of maximum likelihood 
to the estimation of an examinee’s ability,” 
Psychometrika, 18 (1953), 56-57. 

Given the performance of an individual 
on a series of fallible items which sample a 
specified ability, what is the best estimate 
of that individual’s “true” ability? The 
author seeks to construct a metric for meas- 
uring the ability underlying a test score 
that will remain invariant under presum- 
ably comparable measures of a given abil- 
ity. The basic parameters in the estimation 
model are: A; measure of item difficulty re- 
lated to the proportion p; of examinees who 
answer the item correctly; c measure of true 
ability; R; biserial correlation between 
answer to item i and true ability of exam- 
inees. From these basic parameters, assum- 
ing that the distribution functions needed 
are normal, the author derives expressions 
for the probability that individual a with 
ability level cg will answer item 7 correctly. 
Given these theoretical probabilities, the 
author obtains maximum likelihood es- 
timates for c, and relates these estimates to 
the usual type of test score. It is shown that 
in the special case where all items in a test 
are of equal difficulty and are equally cor- 
related with the ability measured, the 
maximum likelihood estimate is a simple 
function of the usual type of test score. For- 
mulas for the standard error of the maxi- 
mum likelihood estimates are obtained for 
conditions of the model. Relationships be- 
tween these standard errors and the dis- 
criminating power of the test at various 
ability levels are determined and pro- 
cedures for estimating confidence intervals 
for the true ability score in terms of the 
test score are given. For tests composed of 
equivalent items, the shortest confidence 
interval for the true score as a function of 
the test score is obtained for test scores 
slightly above the halfway point between a 
chance score and a perfect score. B. J. 
Wier, University of North Carolina. 





Maritz, J. S., “Estimation of the correlation 
coefficient in the case of a bivariate normal 
population when one of the variables is di- 
chotomized,” Psychometrika, 18 (1953), 97- 
110. 

Given a normal bivariate population in 
which one of the variates has been di- 
chotomized and the other variate is con- 
tinuous but restricted in some way. The 
biserial correlation coefficient is no longer a 
consistent estimate of the population cor- 
relation p. An estimate G defined as 
b/(1+0*)"2, where b is the estimated re- 
gi © ssion coefficient of the continuous variate 
on the dichotomized variate) has been pro- 
posed to handle this latter case. The author 
has adapted the methods of probit analysis 
for estimating b for various cases of restric- 
tion in the continuous variate. The deriva- 
tions of these methods are presented. Em- 
pirical sampling experiments from normal 
bivariate populations were carried out to 
obtain information on the sampling dis- 
tribution of the coefficient G. Comparisons 
are made between variances of the probit 
estimates of the regression coefficients and 
those obtained from other estimates. The 
empirical results indicate that G is a more 
efficient estimate of p than is the biserial 
correlation, even in those cases where both 
coefficients are consistent estimates of p. 
B. J. Wuver, University of North Caro- 
lina. 


McIntyre, G. A., “A method for unbiased 
selective sampling using ranked sets,” 
Australian Journal of Agricultural Re- 
search, 3 (1952), 385-90. 

A novel method of sampling to estimate 
the mean value of a characteristic is pre- 
sented for the case where measurements of 
the characteristic are expensive but it is 
easy to rank a sample with respect to it. 
For example, it may be easy to rank the 
plants on a certain area of ground with 
respect to height, weight, or crop yield, but 
considerably more expensive actually to 
measure any of these characteristics. The 
procedure is to form n independent random 
samples of size n each (i.e., draw a random 
sample of n? and divide it at random into n 
subsamples of n each), then get a final 
sample of n by selecting the largest item 
from the first subsample, the second largest 
from the second subsample, and so on down 
to the smallest from the nth subsample. The 
precision of an arithmetic mean calculated 
from this final sample of n is considerably 
greater than for a simple random sample of 
n. The ratio of the variances for various 
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population forms and sample sizes ranges 
from 1.33 (negative exponential distribu. 
tion, n= 2) to 3.00 (rectangular distribution, 
n=65). The author suggests (n+1)/2 as the 
typical ratio. The paper also discusses esti- 
mation of second and higher population mo- 
ments, use of a priori knowledge about dis- 
tribution form, errors of ranking, and clust- 
ering of sets to simplify ranking. W. A. 
Wauis, University of Chicago. 


Moore, P. H., “The estimation of the Pois- 
son parameter from a truncated distribu- 
tion,” Biometrika, 39 (1952), 247-51. 

A counter in a physical problem ap- 
peared to stick at certain numbers when 
counting radioactive particles, e.g., when 
there were more than r emissions in a given 
interval. The sample is thus truncated at 
certain point, although the number of ob- 
servations beyond this point is known. The 
author proposes a simple estimate of the 
Poisson parameter: z= Lin;/ Zn; where n; is 
the number of intervals with ¢ emissions in 
the interval. The total number of observa- 
tions beyond the truncated point is 
N—2n;. This estimate is slightly biased 
(of order 1/ N). The variance of z was also 
derived. This method of estimating the 
Poisson parameter was applied to two 
series of data and found to agree favorably 
with the maximum likelihood solutions. 
T. W. Horner, North Carolina State Col- 
lege. 


Rudra, A., “Discrimination in time series 
analysis,” Biometrika, 39 (1952), 434-39. 

A sequential test procedure is presented 
to decide if a given time-series is of a ran- 
dom, autoregressive, or moving average 
type. If one of the latter two, it is shown 
to have the lowest possible order. The test 
procedure was applied to 28 series and the 
results compared with the decisions based 
on other techniques. The probabilities of 
making various decisions based on 100 op- 
erations were computed for two different 
known structures. R. L. Anpzrson, North 
Carolina State College. 


Rushton, S., “On a two-sided sequential t- 
test,” Biometrika, 39 (1952), 302-8. 

A sequential procedure is given to test 
the hypothesis that the mean, yu, of a nor- 
mal population is zero against the alterna- 
tive that 4=+460, where 4 is fixed and ¢ 
must be estimated. The results are extended 
to the problem of testing the difference be- 
tween two means. R. L. Anpzrson, Norih 
Carolina State College. 
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STATISTICAL ABSTRACTS 


Skellam, J. G., “Studies i in statistical ecol- 
I. Spatial pattern,” Biometrika, 39 
(1952), 346-62. 
A number of distributions arising in 
quadrant sampling are considered in rela- 
tion to the underlying pattern of organ- 
= It is shown that the same distribution 
arise from several quite distinct 
models. A few ways are briefly suggested, 
by considering additional evidence of a 
different kind, as to how to decide whether 
a given model is appropriate. L. D. Carvin 
North Carolina State College. 


Stevens, W. L., “Samples with the same 
number in each stratum,” Biometrika, 39 
(1952), 414-17. 


Some results are given on the efficiency 
of constant number vs. proportional sam- 
pling. The approximate efficiency is 
E=m(1—F)/m,—Fm*), where m is the 
first moment and m, the second moment 
(about zero) of the frequency distribution 
of the number of units per stratum, and F 
is the sampling fraction. R. L. Anpgrson 
North Carolina State College. 


Whittle, P., “Tests of fit in time series,” 
Biometrika, 39 (1952), 309-18. 

A general least squares test of fit of time 
series models is presented. The statistic is 
shown to be asymptotically distributed as 
x*; in the limit, the statistic is the ratio 
of the geometric and arithmetic means of 
the residual variates’ periodogram. The 
examples included are concerned mostly 
with autoregressive schemes, but it is em- 
phasized that the tests are appropriate for 
other methods of graduation. In examples, 
the test amounts to accepting the hy- 
pothesis giving the best fit. D. Gossiess 
North Carolina State College. 


Williams, E. J., “Use of scores for the 
analysis of association in contingency 
tables,” Biometrika, 39 (1952), 274-89. 
Given a contingency table with cell fre- 
quencies njj, i= 1,°**, pandj=l1,**:, 
a(p2q) where ZiZjnij=n... If fixed scores 
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are available for both classifications, a sim- 
ple formula is given to estimate the correla- 
tion coefficient, r, to be used as a measure 
of association. The ratio (n, ,—2)r?/(1—r?) is 
approximately distributed as F(1, n,,—2), 
where the numbers in the parentheses refer 
to numerator and denominator degrees of 
freedom of the variance ratio distribution. 
If only the p row scores are fixed, a method 
is given to estimate the g column scores, so 
as to maximize the multiple correlation, R. 
In this case with ratio (n,.—q)R*Xq—1)(1 
—R*) is approximately distributed as 
F(q—1, n..—q). When neither set of scores 
is fixed, R and the scores are estimated by 
@ canonical analysis. The estimate of R? is 
the largest latent root of a matrix whose ele- 
ments are simple functions of the cell and 
marginal frequencies. Tests of significance 
for a proposed set of scores are presented 
for g=2 and 3. An example is presented 
with p=4 and qg=3 and 4. Some indications 
are given in an appendix of the adequacy of 
the approximations used. R. L. Anpsrson 
North Carolina State College. 


Youden, W. J., and Connor, W. S., “The 
chain block design,” Biometrics, 9 (1953), 
127-40. 

Most experimental designs used in agri- 
culture, biology. psychology, etc., involve 
a fairly high degyee of replication, which is 
appropriate because of the magnitudes of 
variability encountered. Physical measure- 
ments (e.g., spectroscopic determinations) 
are ordinarily made with far greater preci- 
sion and a high degree of replication repre- 
sents a waste of resources. The chain block 
design is a very elastic arrangement calling 
for two determinations (each in a different 
block) on some treatments, and only one on 
the others. This results in an over all degree 
of replication which lies between one and 
two. Methods for layout and analysis are 
given; practical considerations influencing 
choice of layout are considered; a numerical 
example (42 “treatments,” of which twelve 
are repeated, in three blocks) is worked out 
in detail. Lrvcouw Moses, Stanford Uni- 
versity. 





BOOK REVIEWS 


Cyclical Movements in the Balance of Payments. Tse Chun Chang. Cambridge 
(England): Cambridge University Press, 1951. Pp. x, 224. $3.75. 


See the article by Solomon Fabricant, pp. 79-87 in this issue. 


Demand Analysis: A Study in Econometrics. Herman Wold in association with 
Lars Jureén. New York: John Wiley and Sons; Stockholm: Almquist and 
Wiksell, 1953. Pp. xvi, 358. $7.00. 


See the article by H. 8. Houthakker, pp. 88-96 in this issue. 


Facts from Figures. M. J. Moroney. Baltimore, Maryland: Penguin Books, 
Inc., 1953. Pp. 472. $0.85. 


HIs constitutes a minor revision in content, but a major downward revi- 

sion in price, of the volume reviewed by M. A. Girshick in last Septem- 
ber’s issue of this Journal (Vol. 48 (1953), 645-47). The changes which have 
been made in content are described by the following sentence from the 
Preface: “The contents remain almost unchanged, except for the latter part 
of Chaper [sic] II which I have revised to include a new approach to modified 
limit control charts.” Changes which have not been made are described in 
the following two sentences: “I am sorry still to remain persona non grata to 
the index number men and the fortune tellers, but there it is. I give way to 
none in my admiration for the theory (may its shadow never be less!), but 
when it comes to a great deal of the practice I simply cannot help chuckling.” 

W.A.W. 


The Application of Operations Research to Industry. Ellis A. Johnson (Director, 
Operations Research Office, Johns Hopkins University, Chevy Chase, Mary- 
land). Published by the author, 1953. Pp. 61. Paper. Free of charge. 


A. W. Swan, Courtaulds, Ltd., Coventry, England 


= writer finds this an exasperating publication because parts of it are 
so extraordinarily good and provocative, while parts seem to wander off 
into complexities that have little useful interest to the worker in Operations 
Research. 

The introduction is one of the best parts of the book, with a penetrating 
analysis of the basic thinking in O.R. The author points out that the methods 
of Operations Research are closer to those of the basic sciences than to those 
of engineering, but that the techniques and methodology have much in 
common with those of industrial engineers and management consultants. 
He goes on to say that O.R. has been concerned from the start with the 
decision-making system in general, and with the problem of providing indi- 
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yidual executives with management advice. He considers that one of the 
main contributions to Operations Research is the use of the team and that 
it has, more consciously than industrial engineering, developed action- 
models based on fundamental theory. He also feels that it has relied much 
more upon complex mathematical concepts and techniques and has realised 
more fully than industrial engineers the necessity of estimating the uncer- 
tainty of its predictions. “Operations Research places a particular demand 
on the analyst’s ability to translate his findings into language which simply 
and clearly sets forth the values, effectiveness and costs of a set of proposed 
courses of action.” 

After this stimulating introduction the author has a chapter, which the 
writer finds baffling, on the “Relation of the Operations Analyst to the 
Executive,” with a set of diagrams illustrating the interactions of various 
departments and factors. The ordinary O.R. worker would consider it a 
waste of time to set these down. 

The following sections, “The Operations Checklist for Solving Action Prob- 
lems” and “Planning Detailed Operations” set out principles in diagrammatic 
form, and these diagrams are presumably correct, but they are, from the 
analyst’s point of view, what “Punch” calls “glimpses of the obvious,” since 
they are so much taken for granted by the O.R. analyst and industrial en- 
gineer that they have become sub-conscious and do not need to be shown as 
complex diagrams. 

Chapter III gives a brief description of “Some Selected Analytical Tools.” 
Unfortunately, statistical method, certainly the most useful and widely ap- 
plied O.R. tool today, is dismissed in a brief paragraph as being well outlined 
in Morse and Kimball—an opinion which might not be shared by everyone. 
Statistical method is not the only new tool used by the O.R. worker to 
distinguish him from the industrial engineer who preceded him, but it can 
be stated with some confidence that a very large proportion of present day 
industrial O.R. work is based upon the statistical approach; the whole 
gamut of statistical method is used and the statistician is an essential mem- 
ber of any worthwhile Operations Research Department. At this point, any 
publication on Operations Research must necessarily devote a good deal of 
attention to the statistical approach and the methods of applying statistical 
thinking to industrial problems. 

The next tool mentioned is Symbolic Logic, but unfortunately the example 
given appears to be one of very few in which this tool has been applied, and a 
description of the same example is given in Factory, October, 1953. We 
then proceed to the “Theory of Value” and while this is doubtless a useful 
method, the writer is not aware of any example in which it has been applied. 
The following two sections are, “Queueing Theory” and “Stochastic 
Processes,” both of which are, in effect, subsections of statistical method. 
The “Theory of Games,” the next section, is, in the opinion of a large number 
of O.R. workers, a highly important potential tool, to be used mainly in 
conjunction with what is known as linear programming, and there is a con- 
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siderable literature on the subject. The practical use of this kind of thinking 
has, however, not yet proceeded very far, and this again is a potential rather 
than actual method. 

We then have Chapter IV which gives examples. Unfortunately we start 
with some “polishing of medals” with two examples taken from wartime 
practice. The next example is taken from agriculture—not perhaps the most 
useful for the industrialist. There is also included a brief reference to the 
massive problem relating to the standard of living in Puerto Rico, interesting, 
but not of much immediate use to the potential O.R. worker. The author 
then includes a summary of an excellent paper by John F. Magee of Arthur D. 
Little, Inc., “The Effect of Promotional Effort on Sales”—the only purely 
industrial example. 

In the final brief chapter on the difficulties met in Operations Research the 
author returns to his penetrating analysis and the result is excellent. The 
first difficulty mentioned is that of communication between the scientist and 
the executive. The O.R. analyst has to learn the executive’s language and 
how to translate into that language. “An operations research study becomes 
effective in proportion to the amount of effort spent in communicating the 
effects of the research and clearing up with the executive on a personal basis 
all the questions involving the validity of the study. Since very few analysts 
are adept at, or recognise the need for such ability on their part, the results 
of much good O.R. are never used.” The author lists other difficulties in 
which he includes the extreme difficulty in getting highly skilled specialists 
from very diverse and often antagonistic disciplines to work well as a closely 
integrated team. This difficulty can, however, be readily overcome by adopt- 
ing for every job the simple plan of having the O.R. analyst in charge form a 
team, consisting of the appropriate members of his own staff and the tech- 
nicians and other specialists whose knowledge will be most valuable, on the 
basis that the team will work together for a common aim, and that each 
member of that team will stand to gain personally in kudos. 

In connection with this review it is fair to point out that the subject of 
Operations Research is a thorny one. There is, today, no completely satis- 
factory book on the subject and anyone who has the courage to tackle the 
subject as Ellis Johnson has done deserves praise, especially if he has suc- 
ceeded in giving useful suggestions, as is certainly the case in this book. 


The Revision of the Rapid Transit Fare Structure of the City of New York. 
William S. Vickrey. New York: Mayor’s Committee on Management Survey of 
the City, 1952. Pp. xii, 156. Paper. 


WiiuraM R. Bucxuanp, London Transport Executive! 


HIs report is the third of the Technical Monographs from the Finance 
Project of the Mayor’s Committee on Management Survey of the City of 





1 The views expressed are purely those of the reviewer and are not necessarily the views of the 
organization. 
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New York. Its author, one of the Staff Members of the Finance Project, is 
an Associate Professor of Economics at Columbia University. 

After a short introduction on the background of the problem, the report 
puts forward a marginal costs basis for deciding upon a fare structure to pro- 
mote the optimum utilization of the particular transit facilities under dis- 
cussion. This theme is further developed in Chapter 4 (and its Appendix) 
although it is preceded by a chapter on “Patterns of Traffic’—which is 
largely an account of trying ‘to make bricks without (statistical) straw’— 
and followed by a development of this traffic theme on the lines of the diffi- 
culties of adjusting services to conform to the traffic pattern. The mechanics 
of fare schemes and collection devices are dealt with in Chapters 6, 8 and 9 
while the intervening Chapter 7 again develops the traffic theme in relation 
to fare changes. Finally there are two short chapters on considerations of 
equity and what may be called general Social Planning. 

The economic theme of the report may perhaps be illustrated by two ex- 
tracts from pages 4 and 5: 


Since fares must necessarily be set in advance and announced to potential 
passengers if they are to have the proper effect upon the passenger’s decision 
to travel or not to travel at a given time and place.... 


Only if the fare fully reflects at all times and between all points the costs 
of carrying additional passengers will the fare structure achieve an efficient 
utilization of the facilities... . 


The development of the principle ultimately produces a set of proposals on 
the desirable fare structure for which it would be difficult to carry out the 
intentions expressed in the first of these extracts, and which would be be- 
wildering to the travelling public if they could. 

The important point of the non-monetary costs of travel in the form of 
fatigue and time costs is brought out very well but the effect is somewhat 
marred when it is recorded that: 


the passenger would be willing, in order to avoid the inconvenience, to pay 
an amount that would cover the additional money costs (of providing addi- 
tional services) .... 


The idea of passengers being willing to pay more money is quixotic since 
these hedonic cost components are capable of infinite variation as between 
individuals as well as in time and space. Given a fare structure and a pattern 
of services, the traveller will tend to minimize his total costs—monetary and 
non-monetary—according to the needs of the moment and changes in these 
needs are, at this level of detail, part of the random fluctuations which are 
present in all patterns of travel. Therefore it is suggested that the fare struc- 
ture which will help the traveller most in this task is one for which the money 
cost is virtually the same for comparable distances at any time of the day. 
In this way the passenger has the greatest opportunity to work out his own 
salvation and the traffic flows unconstrained by differential fares. It must be 
said, however, that this kind of scheme requires a degree of assimilation of 
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fare structures for all;the major forms of public passenger transport which 
may not at present be available in New York. It seems to this reviewer that 
the disadvantage with schemes for differential : :res according to time and 
place—such as are largely advocated in this report—is that as soon as a 
fare structure is promulgated its operation tends to change the flows of 
traffic upon which it is based and it then becomes necessary to change the 
fare structure if the optimum position is to be maintained. Thus, the fare 
to be paid for a given journey is a matter of some doubt for the travelling 
public which must be psychologically undesirable. 

To deal with some rather more practical issues: on page 10 there is an 
equation expressing operating expenses in terms of certain physical operating 
characteristics. The equation appears to be the usual form of multiple re- 
gression equation assuming no interaction between the various characteris- 
tics but it has, in fact, been developed by the non-statistical process of allo- 
cating the operating expenses to the various physical characteristics of 
operation. For example, the salaries of motormen are allocated to the char- 
acteristic of train miles. This may have been the only reasonable method 
available but it would have been desirable to have given more space to the 
interpretation in order to avoid possible confusion. In Chapter 3 there does 
not appear to be any reference to the survey work on travel in New York 
which has been done by various organizations in connection with trans- 
portation advertising and which should have been useful for this investiga- 
tion. In connection with the availability of data for this study it would have 
been useful to have put forward some recommendations on how the various 
deficiencies might be filled. The detail in Chapter 9 of the various collection 
devices associated with the different kinds of fare structures discussed in 
this report give the reviewer a distinctly unhappy feeling as to their prac- 
ticability both from the engineering and the commercial point of view. There 
appear to be far too many things capable of going wrong. These vary from 
the station staff not changing something vital at the right time, through the 
peak-period passenger not having ready the right coins, to the mechanical 
and electrical devices necessary to display illuminated signs for the currently 
required fare (with gongs to signal the alterations) and the amount of change 
the passenger may expect to receive as a result of his not having the correct 
coins available. 

The considerations of equity and social planning at the end of the report 
begin to place the whole subject into a perspective in which passenger trans- 
port takes on the aspect of something which is very much concerned with 
the human business of living, working, and playing. Transport is only a 
means to an end and that end may well be the optimum utilization of the 
human and material resources of a given area, say New York. Surely, since 
this investigation was worthwhile, it should have been approached in the 
spirit of an operations research project, for its solution demands the welding 
of economics to transport operation and engineering with statistical meth- 
ods as the flux. As it is, an undue concentration of attack on the first of 
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these has produced a report which will probably make the transport operator 
and his engineering colleagues shudder: the statistician will merely lament 
once more that most of the data necessary for the problem were not avail- 
able. 


Measuring Your Public Relations, a guide to research problems, methods and 
findings. Herman D. Siein. New York: National Publicity Council for Health 
and Welfare Services, Inc., 1952. Pp. 48. $1.25. 


Marie Janopa, New York University 


— aim of this booklet is to provide the professicnal personnel in health 
and welfare organizations with a balanced view of what research can do 
for their agencies and to describe different research procedures adequate for 
different types of problems. Mr. Stein does not want to “sell” research; 
rather he wishes to enable persons concerned with the practical work of 
health and welfare agencies to decide for themselves what research they 
need. 

With this aim in mind he discusses: the nature of the problems which 
arise in the public relations of voluntary agencies; informal research tech- 
niques (they might better be called fact-finding techniques) which an agency 
can apply largely without expert help; pre-testing of written material; the 
value and limitations of public opinion polls; communications research, etc. 

The presentation of these matters is straightforward without being over- 
simple. There are many examples used in the text to illustrate a point. A 
small list of selected references on a more technical level completes the pub- 
lication. 

It lies in the nature of such a publication that it offers no new ideas. The 
research person who is about to embark for the first time on a study for or 
of an action agency might find it helpful, however, to glance through this 
booklet. Whether or not it fulfills its aim with its actual target audience is 
in itself a question for research in public relations. 


The WOI-TV Audience. Mimeo Series No. 1. Ames, Iowa: Statistical Labora- 
tory, Iowa State College, 1952. Pp. 125. Paper. 


Lester R. Franxeu, Alfred Politz Research, Inc. 


he WOI-TV Audience is a statistical report describing the size and char- 
acteristics of the television audience located within a 50 mile radius of 
Ames, Iowa. The data are based upon a sample survey, and were obtained 
for the purpose of establishing bench mark data against which future sur- 
veys may be compared. The text material in this report is a description of 
the methods used to obtain the data. 
The survey design, the questionnaire, the sample plan, and the field opera- 
tions were the responsibility of the Statistical Laboratory, Iowa State Col- 
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lege and, as is to be expected, the techniques used appear to be superior to 
those employed in the usual stereotype commercial television audience 
study. 

The sampling procedure was designed in such a manner as to make full use 
of the resources that were available. The sample was of the multiphase, 
single-stage type. After a sample of households had been selected, household 
characteristics data relating to the ownership of TV sets was obtained. In 
the second phase, additional data were obtained from all television house- 
holds, 25 per cent of the non-television households and 50 per cent of all the 
adults in these two groups of households. 

Single-stage sampling was employed in the selection of the households to 
be included in the sample. A sample of 400 segments was selected to repre- 
sent the survey area, and all households within each of the designated seg- 
ments (approximately 6) were included in the first phase sample. Of par- 
ticular interest to the practicing sampling statistician is the discussion of 
the uses of the Master Sample Maps, the City Directory, photocounting, 
and the methods of cruising for the determination of segment sizes. 

Aside from the problem of sample design (which is merely a blueprint) the 
problems of the execution of the survey are discussed in detail. The training 
was particularly important in view of the fact that the second phase of 
sampling was accomplished by the interviewers at the field level. In addition, 
since some of the questions on the questionnaire dealt with the respondent’s 
activity on the day before the interview, it was necessary to spread the inter- 
viewing equally over all seven days of the week. 

The format and organization of the report as well as the presence of some 
minor arithmetical inconsistencies tend to detract from the impressiveness 
of the study. However, it is clear from the description that the design was 
not intended to produce a quality impression but a quality study in the most 
efficient manner possible. For example, ten years ago it was more or less 
assumed that the steps in selecting a sample should follow the time consum- 
ing sequence of primary unit selection, segment selection, listing of addresses 
by interviewers, final household selection in the central office from the often 
shaky prelistings, and interviewers revisiting the selected locations to inter- 
view. In the study of the WOI-TV audience there was an efficient utilization 
of man hours. Real inventiveness based on genuine statistical know'edge 
obviously played a role in finding an additional method by which the 4. use- 
hold and individual selection was accomplished in a single field operation. 


Social and Psychological Factors Affecting Fertility. Volume Three. P. K. 
Whelpton and Clyde V. Kiser. New York: Milbank Memorial Fund, 1952. $1.00. 
Paper. 


E. Lewis-Fantna, Welsh National School of Medicine 


Peers years ago, Raymond Pearl (1939) in his book “The Natural 
History of Population” calculated pregnancy and live birth rates for 
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various groups of women, differentiating those who had, and those who had 
not used contraceptive methods. He discussed among other things the effects 
of contraceptive efforts on the pregnancy rates in relation to economic status, 
education and religion. 

Ten years later, the Royal Commission on Population (1949) compared 
pregnancy rates for periods of reproduction in which (a) no birth control 
was used, (b) contraception was abandoned, (c) contraception was being 
used. They also compared the average desired and actual size of the family, 
and the number of unwanted children for groups of women classified ac- 
cording to the degree of success attained in planning and spacing a family by 
contraceptive methods. 

The collection of seven papers here reviewed goes still more deeply into 
the subject and inquires what social and psychological factors contribute 
towards successful family planning. Is it those who feel most economically 
insecure who successfully restrict the size of their families? Is it those who 
plan other aspects of their lives? Is it those with poor health or those with 
a feeling of personal inadequacy? These are among the problems which con- 
stitute the subject matter of these papers. 

It will be noted that in both the earlier publications the indices used were 
quantitative—or, if not, were definitely factual—grade of education reached 
or religious denomination—and that to such data statistical methods and 
reasoning could legitimately be applied. The reader’s reactions to the volume 
under review must depend on whether or not he accepts that indices of such 
nebulous qualities as “a feeling of economic insecurity” or “a tendency to 
plan in general” or “a feeling of personal inadequacy”—indices to which 
statistical methods and reasoning have here also been applied—have been or 
indeed can be satisfactorily constructed. 

One example will illustrate the dubiety which the reader should not fail 
to feel. Replies by 1444 wives to the question “Do you plan your buying to 
take advantage of sales?” were tabulated as below, showing the distribution 
according to success in family planning (“A” being the most successful) of 
the group of women giving each specific answer. 

On this, the authors comment: “One group of wives answered “Very 








Percentage distribution by fertility 
Plan to buy planning status 
at sales : 





B D 





Very often 13 24 
Often 17 28 
Sometimes 13 24 
Seldom 11 39 
Very seldom 7 52 
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Often.” Among these, 44% were in the effective fertility planning categories 
(A and B). Only 28% of the wives answering “Very seldom” were in the 
effective fertility-planning groups.” 

Can answers such as these be used as indices of “a tendency to plan in 
general”? If one has been reared in the belief that most of the goods offered 
in sales are manufactured especially for sales and are generally of an inferior 
quality, one plans not to buy at sales. Nevertheless, if there is a large family, 
the wife may be forced to buy in the cheapest market. Does this make her a 
(good) planner? In fairness, it must be stressed that in the instance cited, 
the conclusions are based on the replies to many more questions of the same 
type than the one used as illustration. The volume as a whole must contain 
some hundreds of tables like the one quoted. There is no lack of data and 
the arguments from interpretation of the figures, although involved, seem 
reasonable, always provided that the indices used do really measure what 
they are supposed to be measuring. On this fundamental matter some scepti- 
cism is not unjustified. 

In one section, scores are allocated to sets of replies to different questions, 
and Pearsonian correlation coefficients calculated as a measure of the inter- 
correlation between the different indices used, in spite of there being no evi- 
dence that the intervals represented by differences between the scores were 
uniform. Is the difference between “Very often” and “Often” equivalent 
to the difference between “Often” and “Sometimes”? 

Qualified statisticians reading these papers can safely be left to utilize 
their professional knowledge and experience in assessing the statistical 
validity of the conclusions reached but there is a real danger that the 
statistically unsophisticated (who comprise possibly the majority of workers 
in this field) will be overawed by the facade of diagrams and statistical tables 
into accepting the conclusions as authoritative and final. To such, two things 
need pointing out: first, that the 1444 couples whose reproductive and contra- 
ceptive histories and personal opinions as to the impact of their economic 
circumstances and psychological characteristics on their desire for children, 
form the data common to all seven papers, although a homogeneous group 
and therefore presenting certain advantages for studies of this type, are far 
from being representative of any past or present community in the U.S.A.; 
second, that the answers recorded on the questionnaires were given as long 
ago as 1941-42 by couples married in the years 1927-29 and reared in the 
traditions of 1900-10. Strictly, then, the replies should be interpreted in the 
light of the economic circumstances that prevailed when they were building 
their families—mainly in the era of depression of the late 20’s and early 
30’s—rather than in the light of those of the 1950’s. That the authors have 
records enabling them to do this, or indeed that they realize the essentiality 
of attempting it, is not clear. Many aspects of life changed even between 
1941 and 1951. 
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Statisticians, especially those interested in the field of human fertility, 
would be rendering a service by reading and commenting on this publication. 


REFERENCES 


Raymond Pearl, The Natural History of Population, Oxford University Press, 
London, 1939. 

Papers of the Royal Commission on Population, Vol. 1, Family Limitation and 
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Community Wage Patterns. Frank C. Pierson. Berkeley and Los Angeles: Uni- 
versity of California Press, 1953. Pp. xvii, 213. $3.75. 


Mircue.t O. Locks, University of Oklahoma 


a volume is an attempt to determine the nature and causes of relation- 
ships between post-World War I wage developments in Los Angeles 
County, California and those in other large metropolitan areas. The book has 
chapters on each of the following topics: Pre-1940 Wage Levels; 1940-1949 
Wage Levels; Local Influences on Wage Levels; Industry Influences on Wage 
Levels; Relationship Between Employment and Wages; Relationships Be- 
tween Investment, Productivity, and Wages; and The Influence of Unions on 
Local Wages. As can be seen from the titles, each chapter covers a topic so 
important that it could be a separate study in itself. 

The author has used a great wealth of material assembled from many 
different sources, principally other books about California. However, with a 
study having such broad scope confined to only 164 printed pages (besides 
Appendix Tables), it is not surprising that at times the author seems to 
wander aimlessly in a forest of secondary statistics. The reviewer received 
the impression that the author was not sure of what his data showed him, 
and therefore had to state many of his conclusions without strong conviction. 

An example of this may be found in his analysis of the relationship between 
Employment and Wages. He computed certain rank correlation coefficients 
between employment and wages for the period from 1929 to 1939, and also 
for the period from 1940 to 1949. His average rank correlation coefficient for 
six cities for the earlier period was statistically significant (although the indi- 
vidual rank correlation coefficients for each of the six different cities for that 
period were not), while the average rank correlation coefficient (as well as 
the six individual rank correlation coefficients) for the later period were not 
significant. On the basis of this very meagre evidence, the author makes the 
following statement without further explanation: 


This suggests that industry employment and wage levels move together when 
there is a large amount of unemployment, but that these two variables bar 
no consistent relationship to each other when labor market conditions are 
relatively tight. (p. 116) 
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This statement contradicts the widely-held view that the elasticity of 
labor supply with respect to wage-rates is high in depression. For that 
reason, the author missed an opportunity to make a significant contribution 
to wage theory by not further explaining the reasons for this observation. 

Another example of inadequate explanation occurs on page 61: 


... hourly earnings in Los Angeles manufacturing rose less rapidly than 
in the United States between 1945 and 1949 (30 per cent as against 37 
per cent). 


However, data obtained from Appendix Table 1 show that Los Angeles re- 
tained its ranking in average manufacturing wage levels during the period 
from 1945 to 1948. (Comments concerning the validity of some of the data 
in Appendix Table I will be found later in this review.) These data purport to 
show that Los Angeles was seventh in average manufacturing industry wage 
levels in a group of 20 large cities in both April, 1945 and April, 1948. Al- 
though this discrepancy in findings made at two different points in the book 
is a relatively minor one, the author should have furnished an explanatory 
note. 

The overall statistical facets of this book inspire the following comments: 

(1) Applications of the Analysis of Variance. In Chapters VI and VII the 
author performed analyses of variance on eleven different manufacturing 
industries in six different cities with respect to percentage changes in em- 
ployment, average annual earnings per worker, and value added per worker 
from 1929 to 1939. Separate analyses were performed for each of these three 
characteristics using the “F” test, and the results were reported in Tables 
1] and 20. However, examination of Tables 8 and 19 show that the same 
data were ranked within communities for purposes of computing rank cor- 
relation coefficients. Thus the data in those tables were already in a form 
such that with a negligible amount of additional calculation, a “Friedman” 
rank analysis of variance could have been performed.' The reviewer submits 
that the latter method would be preferable because of the normality assump- 
tion implicit in the “F” test. 

(2) Appendix Table I. There is reason to doubt the validity of the data in 
the last column of Appendix Table I. Since much of the analysis of wage 
movements in Chapters II and III is based on that table, there should be 
further explanation of how those data were obtained. The reasons are: 

(a) The source given for the data in that table is a book published in 
1946.2 Yet this fourth column gives manufacturing wage indices for April, 
1948. 

(b) The data in the first three columns of that table give, respectively, 
manufacturing wage indices for 20 different cities for April, 1941; April, 1943; 





1 Milton Friedman, “The use of ranks to avoid the assumption of normality implicit in the analysis 
of variance,” Journal of the American Statistical Association, 32 (1937), 675-701. 

2 Ruth Mac Farlane. Wage Rate Differentials: Comparative Data for Los Angeles and Other Urban 
Areas. (Los Angeles, 1946.) The Haynes Foundation. 
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and April, 1945, each adjusted to an average index of 100 for its date. How- 
ever, column 4 has an average index of 144 for the 20 cities. Thus it would 
appear that a different basis was used for calculation of the April, 1948 
indices than had been used for the earlier three dates. 

The reviewer believes that despite indicated shortcomings of the book, the 
author made a contribution to the study of wage patterns. In large measure, 
the scope of his analysis is limited by the fact that he had to use secondary 
data. However, there is need for more studies of this type using more primary 
data than are now available for this purpose. 


Punch-card Methods. Harry P. Hartkemeier. Dubuque, Iowa: Wm. C. Brown 
Company, 1952. Pp. xvii, 360. $5.00. Paper. 


P. C. Hammer, University of Wisconsin 


r*nHE subtitle of this book is “How to Use and Operate Punching, Sorting; 

Electronic Statistical, Tabulating, and Accounting Machines Including 
Types 24, 26, 75, 80, 82, 101, 402, 403, and 407.” All the machines discussed 
are the IBM models. Basically the book is an illustrated reassembly of ma- 
terials contained in the manuals provided by the IBM Corporation free of 
charge. For purposes of instruction this arrangement may be better than 
the individual manuals. 

The book is slanted toward commerce and accounting students, the prob- 
lems being primarily in those fields of interest. The most difficult mathe- 
matical problem dealt with seems to be progressive digiting. Since there is a 
decided lack of good expository material on punch card methods more 
books in the field are indicated. However, the reviewer feels that the author 
has neglected some of the basic punch-card equipment and methods in this 
book. 

For example, the summary punches, the collators, reproducers, and the 
calculating punches (602A and 604) are not discussed although each is of 
great_usefulness in statistical and commercial work. The author gives the 
impression that all machines not discussed are no longer being manufac- 
tured. This is by no means the case; all the additional machines mentioned 
above are still in production. 

Since the book neglects so many virtually essential machines for scientific 
and accounting practice it cannot be recommended as a text without exten- 
sive supplementary materials. 

The book is reproduced by photo-offset and has a soft paper cover and a 
permanent loose-leaf binding. The text is well written in view of the neces- 
sarily segmented character of such manuals. 
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Associated Measurements. M. H. Quenouille. New York: Academic Press Inc., 
1952. Pp. x, 242. 


IsapoRE BLUMEN, Cornell University 


ERE is a handbook on correlation, regression, and related topics which 

many statisticians will find useful. The formal layout of analyses, hints 
on practical pitfalls, and the details of manipulative technique are illustrated 
through the copious use of numerical examples. 

Unfortunately, the very nature of such a handbook seems to have forced 
the omission of so much in the way of basic ideas that it cannot be recom- 
mended to non-specialists. The statistician will want the book in order to 
have details readily available. The more general reader, however, will be 
bothered by such fundamental problems as the rationale for the choice be- 
tween various methods proposed. For him there are not always clear answers. 

The book is divided into four parts. The first forty-eight pages contain 
those “quick and dirty” methods which the author finds most useful and 
which require only plotted data. A section on similar numerical methods is 
included later under the heading of grouping observations. Included in these 
sections are graphical methods for bivariate and multiple correlation prob- 
lems and curvilinear regression, a few non-parametric tests, and some de- 
vices adapted to situations where data is easy to obtain, great accuracy not 
wanted, and simple computation important. Omissions, due apparently 
both to the organization of the book and to the desire to keep it down to 
manageable size, include most non-parametric procedures—e.g., the rank 
correlation coefficient (which name the author chooses to bestow on Ken- 
dall’s tau) and runs tests. Biserial, tetrachoric, and related correlation pro- 
cedures aré not mentioned. 

The second part covers the conventional topics: bivariate correlation and 
regression, multiple and partial correlation, and curvilinearity. This is well 
done, although one might quibble that the author’s treatment of such prob- 
lems as identifiability of parameters is not strictly accurate, the conditions 
he gives being sufficient rather than necessary. The problem of using corre- 
lated variates for screening and selection, as in personnel testing and genetics, 
is not treated. 

The third part includes grouping, analysis of covariance, and general 
pointers on the organization of investigations. Readers who have not been 
exposed to covariance before should be warned that this exposition is not 
particularly lucid. 

The last sixty or so pages deal with a variety of problems. The section 
on multivariate analysis is remarkably well done for so condensed 2 non- 
mathematical discussion. The problems of time series, not being easily re- 
ducible to elementary terms, are somewhat less satisfactorily treated but 
this section will nevertheless be quite useful. There is also a section devoted 
to a variety of hints and comments. 

Most of the more desirable tables are included as is a fairly extensive, 
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but not selected, bibliography. Teachers who use this as a laboratory manual 
in their courses, for which the book is well adapted, may complain of a lack 
of problems for students to work out for themselves. It would also be desir- 
able if in future editions the conventional names for tests and procedures 
were used. 

From an over-all point of view, the reviewer was disturbed by the lack of 
discussion of alternate hypotheses, of the power of various tests proposed 
and of the relative quality of various estimation procedures. Why reject 
extreme observations in one case and not in another? Why use a graphical 
method instead of the more common estimates? Why choose one non- 
parametric device over another? Surely more thoughtful answers could have 
been provided by so competent an author. 


Hypothesis Testing in Time Series Analysis. Peter Whittle. New York: Hafner 
Publishing Company, 1951. Pp. 120. $3.50. Paper. 


JoHN GURLAND, Jowa State College 


N THE subject of time series analysis, where the paucity of suitable sta- 

tistical tests is conspicuous, a book of this sort is a welcome addition to 
the literature. It abounds in suggestions and ideas which should stimulate 
more research in this area. 

The spirit of the book is commendable. It attempts to give a general ra- 
tionale for discriminating between different random structures which might 


be regarded as having generated the same observed time series. The null 
hypothesis and the alternatives are always stated explicitly. Then tests are 
constructed which presumably are optimal in some sense. Some ingenious 
ideas and devices are propounded but Whittle is somewhat carried away in 
his enthusiasm, with the result that clarity and rigor are sometimes sacrificed 
for expediency. The reader of this book should be warned that this is not a 
book which may be read uncritically, but rather one which should be read 
with caution and reserve. 

The first two chapters comprise a brief review of some important results 
in statistics and probability theory, with a few sketchy proofs. Chapter 1 
outlines the testing of hypotheses and the construction of a most powerful 
critical region by means of a sufficient estimator. The second chapter re- 
views the notion of a spectrum for a stationary stochastic process and gives 
the corresponding spectral expansion of the process. The discussion centers 
mainly on a discrete process as this is the type considered throughout the 
book. By restricting the spectral density to be a rational function of z=e*, 
it is shown that the corresponding stochastic process is either an auto- 
regressive scheme or a moving average scheme or a certain generalization of 
these. 

In Chapter 3 a most powerful test is constructed, on the assumption of 
an underlying normal distribution, for testing whether N consecutive obser- 
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vations have a particular covariance matrix. The test criterion is a ratio of 
quadratic forms, and is the same as that given by Lehmann and Stein 
(Annals of Mathematical Statistics, 1948, p. 504). If the covariance matrix is 
assumed to be a Laurent matrix (as is the case for a stationary process), and 
N is large, a test function is constructed which is a ratio of linear functions of 
the empirical covariances, and which is simpler to compute than the afore- 
mentioned ratio of quadratic forms. The large sample distribution of the test 
is given and Whittle states that this test in the case of a stationary process 
has “practically the same power” as the exact test mentioned above. The 
reviewer cannot resist wondering what bappens in the case of small or mod- 
erate values of N, since both the construction of the test and the distribution 
of the criterion assumed large values of N. 

Chapter 4 gives some ingenious approximative methods for getting the 
inverse of a Laurent matrix, also its latent roots. Circulant matrices are used 
in the approximation and the spectral density of the process is elegantly ap- 
plied. It is not at all clear, however, how good are the approximations. As for 
the approximate distributions of quadratic forms and ratios of quadratic 
forms given in this chapter, the reviewer would like to make a few comments. 

In the case of a quadratic form Whittle’s proposal of employing the Edge- 
worth form of the Gram-Charlier series requires investigation. So far as this 
reviewer is aware there is no published theory which established the validity 
of this series as an asymptotic expansion for the distribution of a quadratic 
form in correlated random variables. P. L. Hsu (Annals of Mathematical 
Statistics, 1945, 1946) proves some theorems for certain special quadratic 
forms of independent random variables which justify such asymptotic 
expansion in these cases. For the case of normally distributed variables the 
reviewer hes some papers in the process of publication which show how a 
Laguerrian expansion may be used so as to actually converge to the distri- 
bution function of a quadratic form. 

In regard to a ratio of quadratic forms in correlated normal variables, 
Whittle recommends finding the moments, then using a Gram-Charlier ex- 
pansion to approximate the distribution. This method apparently works 
well in the numerical example given in Chapter 5, but as a genera] method it 
has some inherent difficulties which might be pointed out here. In the first 
place, the problem of finding the moments is, in general, prohibitive. For 
the special cases considered by Whittle the distribution is required for inde- 
pendent variables, and the denominator is such that the ratio is distributed 
independently of the denominator. These circumstances greatly simplify the 
problem. Ordinarily, however, the integrals which represent the moments 
are of hyperelliptic type. In the second place, the validity of the asymptotic 
expansion is suspect. If the range of the ratio is finite then a condition of 
Cramér’s is satisfied which assures that the Gram-Charlier series converges 
to the distribution function. If the range is not finite, then the same ques- 
tions as above regarding the convergence and the validity of the asymptotic 
expansion apply. 
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Some further remarks concerning this chapter are apropos regarding the 
assumption of circularity. A rather sweeping statement appears in referring 
to R. L. Anderson’s distribution, to the effect that the assumption of circular- 
ity is “really no great drawback as N would have to be quite small before the 
power would be seriously diminished by this assumption.” What is meant by 
the terms “really no great drawback,” “quite small,” “seriously diminished,” 
is vague here. If N must not be “quite small” for the assumption of circular- 
ity to be “seriously” questioned, then one could ask whether or not the nor- 
mal distribution is an adequate approximation. If, besides the circularity 
assumption, the characteristic function is “smoothed,” as suggested by 
Koopmans (Annals of Mathematical Statistics, 1942) or more generally as ex- 
tended by Whittle, then one may well wonder how far astray the resulting ap- 
proximation is from the original distribution before circularity and smooth- 
ing were applied. How large or small N must be and what the corresponding 
effect will be on the power and on the true distribution is indeed a moot 
question and one which, in the present stage of development of the theory 
is usually answered by conjectures which to this reviewer seem unduly 
optimistic. 

Chapter 5 provides a numerical example for a test of randomness against 
certain alternative hypotheses. The approximative methods developed in 
the earlier chapters are used, and seem to work quite well for this example. 

Chapters 6 and 7 are entitled “Non-parametric Discrimination” and are 
devoted mainly to the problem of constructing suitable tests regarding the 
structure of a process. The title of these chapters is misleading because the 
tests are constructed from a probability density involving unknown parame- 
ters and, as such, are parametric tests in the conventional sense of the term. 
Whittle, in fact, assumes the parameters have a probability distribution and 
proceeds to apply Bayes’ theorem to construct a posteriori likelihood func- 
tions, then uses a likelihood ratio of such functions. Jt is surprising that such 
an anachronistic approach could have found its way in so recent and other- 
wise modern a book. By assuming that N is large and choosing a convenient 
distribution for the parameters various test criteria are obtained. Many of 
the tests could be constructed directly without appealing to Bayes’ theorem. 
Among the tests considered are the following: (a) Test the order of a moving 
average scheme against the alternative of a different order. (b) Test the 
order of an autoregressive scheme against the alternative of a different 
order. (c) Test whether a process is an autoregressive scheme of a fixed order 
against the alternative of a moving average scheme of a fixed order. (d) 
Opposite of (c). 

In Chapter 8 numerical examples of (c) and (d) are discussed and in 
Chapters 9 and 10 the methods of the earlier chapters are applied to con- 
struct periodogram tests and tests of fit, respectively. 

The final chapter “Indeterminacies in model structures” provides an inter- 
esting investigation into the non-uniqueness of the linear structure of a 
stochastic process for a given covariance function. In the case of a Gaussian 
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process, the indeterminacy is inherent; however, if the process has some non- 
zero cumulants of higher order than the second, a method of discrimination is 
proposed. 

In conclusion, this reviewer would like to quote from F. N. David’s review 
of this book which appeared in Biometrika, Vol. 39, May, 1952. “ .. . How- 
ever it is more than sufficient to say that Mr. Whittle is a pioneer and it 
has always been the fate of pioneers both to stimulate those who folluw and 
to be criticized by those who are wise after the event. All who are interested 
in time series will benefit by reading the book if only from the stimulation 
and excitement which come from trying to go one better than the author. 
This is an important contribution to the research work on time series and 
may well prove to be the foundation stone of a satisfactory theory.” 


Tables of Poisson Distribution. Tosio Kitagawa. Tokyo, Japan: Baifukan, 
1952. Pp. xii, 156. $3.50. 


WiiuraM G. Cocuran, Johns Hopkins University 


HESE tables give the individual terms ¢~"m*/z! of the Poisson distribu- 

tion. Unlike E. C. Molina’s tables (Poisson’s Exponential Binomial 
Limit, D. Van Nostrand, New York, Fifth printing, 1949), they do not 
contain cumulative sums, and they stop at m=10, whereas Molina goes up 
to m=100. However, the interval of tabulation is much smaller than 
Molina’s, being only 0.001 in m up to m=1, and thereafter 0.01 up to m= 10. 
The following table compares the intervals available and the number of 
decimal places (D.P.) given by each author. 








Tabulation interval Decimal places 


Range of m Kitagawa Molina Kitagawa Molina 





0.001-— 0.010 0.001 0.001 8 
0.010— 0.300 0.001 0.01 8 
0.300— 1.000 0.001 0.1 
1.00 — 5.00 0.01 0.1 


5.00 -10.00 0.01 0.1 


10.0 -15.0 0.1 
15 -100 1.0 
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For anyone engaged in accurate computations with small values of m, 
Kitagawa’s tables are a valuable addition to the library. They are attrac- 
tively printed, with ample separation of the figures so as to diminish eye- 
strain and copying mistakes. The text is in English. 

Additional tables give single and double inspection plans. These plans are 
based on the same principle as those by Paul Peach (Industrial Statistics and 
Quality Control, Edwards and Broughton Co., Raleigh, 1947), but were de- 
veloped independently by Kitagawa. If a=consumer’s risk, 8 = producer’s 
risk, Kitagawa gives tables for finding the sample size (or sizes) and the 
rejection number (or numbers) for a=0.1, 8=0.1; a=0.1, 8=0.01; a=0.01, 
8=0.01, whereas Peach’s tables have a=6=0.05. 

Stechert-Hafner, Inc., 31 East 10th Street, New York 3, inform me that 
they hope to have a supply of Kitagawa’s tables at $3.50 each. From my 
correspondence with the Baifukan Company, it appears that the company 
does not wish to promote direct sales from Japan. 


50-100 Binomial Tables. Harry G. Romig (Quality Manager, Hughes Aircraft 
Company, Culver City, California). John Wiley & Sons, Inc., 1953. Pp. xxvii, 
172. $4.00. 


HESE tables show to six decimal places the individual and cumulative 

terms of the binomial distribution for probabilities from 0.01 to 0.50 in 
steps of 0.01 (from which, of course, values from 0.50 to 1.00 in steps of 
0.01 are readily obtained) and for sample sizes from 50 to 100 in steps of 5. 
The introduction defines the binomial distribution, discusses its relation to 
the hypergeometric and Incomplete Beta-function, explains the notation 
used in the tables, describes the procedures used in computing the tables 
and their accuracy, and gives directions for using the tables and for inter- 
polating into them, together with examples. 

The Government Printing Office has recently printed a far more extensive 
table of the binomial distribution—giving, however, only cumulative prob- 
abilities—with entries to seven decimals for the same probabilities covered in 
Romig’s table and for sample sizes from 1 to 150 inclusive, by steps of 1. 
Apparently this table (Ordnance Pamphlet ORDP 20-1, Tables of the Cumu- 
lative Binomial Probabilities, September 1952) is to be made available to the 
public, in which case a more definite notice will be included in this Review 
Section. 

W.A.W. 


Confidence Limits Tables for Samples of Binomially Distributed Data. John 
Folger (Chief, Technical Services Division, Human Resources Institute, Max- 
well Air Force Base, Alabama). Maxwell Air Force Base, Alabama: Human 
Resources Institute, May 1953. Pp. 12. 


HESE tables give 95 per cent confidence intervals for sample sizes from 5 
through 49 by steps of 1, and for all possible numbers of successes. “These 
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confidence limits tables were prepared from T'ables of Binomial Probability 
Distribution, National Bureau of Standards, Applied Mathematics Series 
6.” No exact definition of the confidence intervals is given, nor any further 
account of the method of computing. Presumably, however, the intervals 
are such that not less than 2,5 per cent lies in each tail. 

W.A.W. 


Cambridge Elementary Statistical Tables. D. V. Lindley and J. C. P. Miller. 
Cambridge (England): Cambridge University Press, 1953. Pp. 35. $1.00. Paper 
bound, 


x STATED in the Preface, “This set of tables is concerned only with the 
commoner and more familiar and elementary of the many statistical 
functions and tests of significance now available.” 

Table 1 shows cumulative normal probabilities to 5 decimals for arguments 
0(0.01)3.0(0,1)4 and for all arguments above 3.731, and a brief tabulation 
of the normal frequency function, Table 2 gives the one-tail percentage 
points of the normal distribution function for selected percentages. Tables 
3, 5, and 7 give percentage points of the ¢-, x*-, and F-distributions. 

Table 4 gives the normalizing transformation for correlation coefficients. 
Table 6 gives a means of estimating the standard deviation of a normal 
population from the range of a small sample (13 or less). Table 8 gives 4,000 
random digits. Table 9 gives the square, square root, reciprocal, reciprocal 
square root, and common logarithm and antilogarithm of each integer to 
1000; it also gives inverse circular and hyperbolic root-sine transformations. 
Table 10 gives logarithms of factorials to the base 10. 

It is the hope of the authors that “the values provided will meet the ma- 
jority of the needs of many users of statistical methods in scientific research, 
technology, and industry in a compact and handy form,” and that they will 
be convenient for the teaching and study of statistics in schools and uni- 
versities, 

M.A.L. 


County and City Data Book, 1952. A Statistical Abstract Supplement. Prepared 
under the direction of Morris B. Ullman (Chief, Statistical Reports Section, 
Bureau of the Census). Washington: United States Government Printing Office 
1953. Pp. xxx, 608. $4.25. 


p eooeengs to the Introduction, “This volume is one of a series of supple- 
ments to the Statistical Abstract of the United States, and is designed to 
meet the need for summary statistics for small geographic areas. Compactly 
assembled in this volume are 128 items of data for each county, standard 
metropolitan area, State, and geographic division; and 133 items of data 
for each of 484 cities having 25,000 or more inhabitants in 1950. Also in- 
cluded is a table showing the number of inhabitants of all urban places 
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(mostly incorporated places of 2,500 inhabitants or more) in 1950.... 
The year, 1952, used to designate this edition denotes the year during which 
compilation of the statistics occurred.” The title page notes: “Statistics in- 
cluded: For 1950, Agriculture, Area and Population, Banking, City Govern- 
ment Finances and Employment, Construction, Education, Family Income, 
Housing, Labor Force, Vital Statistics, and other subjects; for 1947 and 
1950, Manufactures; for 1948, Trade and Services; and Climate.” 
W.A.W. 


Bibliographic sur la méthode statistique et ses applications. G. Darmois and 
E. Morice, editors. Paris: Institut International de Statistique and Institut Na_ 
tional de la Statistique et des Etudes Economiques, 1952, Pp. 49. Paper bound | 


per bibliography lists 75 works dealing with statistical method and its 
applications which have been written in or, in two instances, translated 
into French, the majority within the last twenty years. The introduction 
apologizes to authors whose works may not have been cited, explaining that 
the bibliography could not be exhaustive. 

The bibliography has been divided into two sections; (1) General Meth- 
ods, and (2) Applications. Under the first heading are (a) elementary works, 
(b) intermediate works, (c) advanced works on theory, (d) elementary 
probability, and (e) probability theory. The second section includes (f) 
economics and insurance, (g) industry and agriculture, (h) demography, 
(i) medicine, biology, and psychology, and (j) mechanics and astronomy. 

Besides the usual bibliographical details of author, publisher, date of pub- 
lication, number of pages, and price, this bibliography gives the table of 
contents of each work, with a summary of what is included in each part. 

M.A.L. 
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